- 2011 - A trip through the Graphics Pipeline 2011
- 2013 - Performance Optimization Guidelines and the GPU Architecture behind them
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #include <stdio.h> | |
| #include <stdlib.h> | |
| #include <string.h> | |
| #include <ctype.h> | |
| #include <openssl/sha.h> | |
| #define TARGET_PREFIX "20250327" | |
| #define MAX_WORDS 256 | |
| #define MAX_TEXT 2048 | |
| #define MAX_ATTEMPTS (1ULL << 32) // 2^32 attempts (~4.3B, enough for 8-char prefix) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // Compile and run with: | |
| // | |
| // gcc -O3 -march=native fast-strlen.c -lpthread -o fast-strlen | |
| // && ./fast-strlen | |
| // | |
| // Use gcc because clang is too smart and optimizes away parts of the | |
| // benchmark. Results on Xeon(R) CPU E5-2650 v4 @ 2.20GHz with gcc | |
| // 9.4.0: | |
| // | |
| // Scanning 10 times over 4.00GB... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #include <stdio.h> | |
| #include <stdlib.h> | |
| #include <string.h> | |
| #include <ctype.h> | |
| #include <assert.h> | |
| // We'll have 128 tokens. Each token can be up to 32 characters long. | |
| char token[128][32]; | |
| int lexer(char* input) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class E(BaseException): | |
| def __new__(cls, *args, **kwargs): | |
| return cls | |
| def a(): yield | |
| a().throw(E) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| " Specify a directory for plugins | |
| call plug#begin('~/.vim/plugged') | |
| Plug 'neoclide/coc.nvim', {'branch': 'release'} | |
| Plug 'scrooloose/nerdtree' | |
| "Plug 'tsony-tsonev/nerdtree-git-plugin' | |
| Plug 'Xuyuanp/nerdtree-git-plugin' | |
| Plug 'tiagofumo/vim-nerdtree-syntax-highlight' | |
| Plug 'ryanoasis/vim-devicons' | |
| Plug 'airblade/vim-gitgutter' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # OpenMesh-Python Cheat Sheet | |
| ''' | |
| documentation: https://openmesh-python.readthedocs.io/en/latest/ | |
| OpenMesh-Python official repository: | |
| https://www.graphics.rwth-aachen.de:9000/OpenMesh/openmesh-python | |
| C++ documentation: | |
| https://www.openmesh.org/media/Documentations/OpenMesh-Doc-Latest/a04099.html | |
| Installation: | |
| * pip install openmesh # https://pypi.org/project/openmesh/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| PyTorch has pack_padded_sequence this doesn’t work with dense layers. For sequence data with high variance in its length | |
| the best way to minimize padding and masking within a batch is by feeding in data that is already grouped by sequence length | |
| (while still shuffling it somewhat). Here is my current solution in numpy. | |
| I will need to convert every function over to torch to allow it to run on the GPU and am sure there are many other | |
| ways to optimize it further. Hope this helps others and that maybe it can become a new PyTorch Batch Sampler someday. | |
| General approach to how it works: | |
| Decide what your bucket boundaries for the data are. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /* Optional CSS to customize fonts, colors, syntax highlighting. */ | |
| /* Normal */ | |
| @font-face { | |
| font-family: Roboto; | |
| src: url(~/Library/Fonts/Roboto/Roboto-Light.ttf); | |
| } | |
| /* Bold */ | |
| @font-face { |
As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:
The address for a 1-dimensional array is base + x.
The address for a 2-dimensional array is base + x + y*x_size for row-major layout and base + y + x*y_size for column-major layout.
The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size for row-column-major layout.
And so on.
NewerOlder