Kaiyu Shi Stonesjtu

## tmux-cheats.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Stonesjtu
                / tmux-cheats.md
            
            
              Created
              October 11, 2019 06:39
                — forked from Starefossen/tmux-cheats.md
            
              
                My personal tmux cheat sheet for working with sessions, windows, and panes. `NB` I have remapped the command prefix to `ctrl` + `a`.
              
          
    Sessions

New Session


tmux new [-s name] [cmd] (:new) - new session

Switch Session


tmux ls (:ls) - list sessions
tmux switch [-t name] (:switch) - switches to an existing session


## latency.txt
Latency Comparison Numbers
--------------------------
L1 cache reference/hit                       1.5 ns                      4 cycles
Floating-point add/mult/FMA operation        1.5 ns                      4 cycles
L2 cache reference/hit                       5   ns                      12 ~ 17 cycles
Branch mispredict                            6   ns                      15 ~ 20 cycles
L3 cache hit (unshared cache line)          16   ns                      42 cycles
L3 cache hit (shared line in another core)  25   ns                      65 cycles
Mutex lock/unlock                           25   ns
L3 cache hit (modified in another core)     29   ns                      75 cycles

## mpi4py_pycuda_demo.py
#!/usr/bin/env python

"""
Demo of how to pass GPU memory managed by pycuda to mpi4py.

Notes
-----
This code can be used to perform peer-to-peer communication of data via
NVIDIA's GPUDirect technology if mpi4py has been built against a
CUDA-enabled MPI implementation.

## viz_net_pytorch.py
from graphviz import Digraph
import re
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.autograd import Variable
import torchvision.models as models


def make_dot(var):
	Latency Comparison Numbers
	--------------------------
	L1 cache reference/hit 1.5 ns 4 cycles
	Floating-point add/mult/FMA operation 1.5 ns 4 cycles
	L2 cache reference/hit 5 ns 12 ~ 17 cycles
	Branch mispredict 6 ns 15 ~ 20 cycles
	L3 cache hit (unshared cache line) 16 ns 42 cycles
	L3 cache hit (shared line in another core) 25 ns 65 cycles
	Mutex lock/unlock 25 ns
	L3 cache hit (modified in another core) 29 ns 75 cycles
	#!/usr/bin/env python

	"""
	Demo of how to pass GPU memory managed by pycuda to mpi4py.

	Notes
	-----
	This code can be used to perform peer-to-peer communication of data via
	NVIDIA's GPUDirect technology if mpi4py has been built against a
	CUDA-enabled MPI implementation.
	from graphviz import Digraph
	import re
	import torch
	import torch.nn.functional as F
	from torch.autograd import Variable
	from torch.autograd import Variable
	import torchvision.models as models


	def make_dot(var):