Srimukh Sripada postmalloc

## What happens when you allocate a JAX tensor on a TPU.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              22 stars
            
          
                shawwn
                / What happens when you allocate a JAX tensor on a TPU.md
            
            
              Last active
              April 15, 2023 04:11
            
              
                JAX C++ stack trace walkthrough for TpuExecutor_Allocate
              
          
    Twitter thread: https://twitter.com/theshawwn/status/1456925974919004165

Hacker News thread: https://news.ycombinator.com/item?id=29128998
November 6, 2021
How does JAX allocate memory on a TPU?

jnp.device_put(1) is deceptively simple to write in JAX. But on a TPU, what actually happens? How does a tensor containing the value 1 actually get onto a TPU?
Turns out, the answer is "C++", and a lot of it.

  
## alltheendpoints.txt
record_action_trails
start_phone_number_auth
call_phone_number_auth
resend_phone_number_auth
complete_phone_number_auth
check_waitlist_status
get_release_notes
get_all_topics
get_topic
get_clubs_for_topic

## git-commit-template.md

      
              1 file
            
          
              83 forks
            
          
              28 comments
            
          
              618 stars
            
          
                lisawolderiksen
                / git-commit-template.md
            
            
              Last active
              July 15, 2024 19:38
            
              
                Use a Git commit message template to write better commit messages
              
          
    Using Git Commit Message Templates to Write Better Commit Messages

The always enthusiastic and knowledgeable mr. @jasaltvik shared with our team
an article on writing (good) Git commit messages:
How to Write a Git Commit Message.
This excellent article explains why good Git commit messages are important,
and explains what constitutes a good commit message. I wholeheartedly agree
with what @cbeams writes in his article. (Have you read it yet? If not, go
read it now. I'll wait.)
It's sensible stuff. So I decided to start following the

  
## jupyter_animation.py
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.display import display, HTML
import numpy as np

def plot_sequence_images(image_array):
    ''' Display images sequence as an animation in jupyter notebook

    Args:
        image_array(numpy.ndarray): image_array.shape equal to (num_images, height, width, num_channels)

## DFT_ANN.py
"""
Train a neural network to implement the discrete Fourier transform
"""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import matplotlib.pyplot as plt

N = 32
batch = 10000

## gist:be2d6bb242d2fa497b5d93dcafe85f0c
(Dijkstra and plain A* are generally not included here as there are thousands of
implementations, though I've made an exception for rare Ruby and Crystal versions,
and for Thor, Mapzen's enhanced A*. )

A*                                      Ruby    https://github.com/georgian-se/shortest-path
A*                                      Crystal https://github.com/petoem/a-star.cr
A* (bidirectional with shortcuts)       C++     https://github.com/valhalla/valhalla
NBA*                                    JS      https://github.com/anvaka/ngraph.path
NBA*                                    Java    https://github.com/coderodde/GraphSearchPal
NBA*                                    Java    https://github.com/coderodde/FunkyPathfinding

## Batch Normalization.md

      
              1 file
            
          
              11 forks
            
          
              10 comments
            
          
              84 stars
            
          
                shagunsodhani
                / Batch Normalization.md
            
            
              Last active
              July 25, 2023 18:07
            
              
                Notes for "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" paper
              
          
    The Batch Normalization paper describes a method to address the various issues related to training of Deep Neural Networks. It makes normalization a part of the architecture itself and reports significant improvements in terms of the number of iterations required to train the network.
Issues With Training Deep Neural Networks

Internal Covariate shift

Covariate shift refers to the change in the input distribution to a learning system. In the case of deep networks, the input to each layer is affected by parameters in all the input layers. So even small changes to the network get amplified down the network. This leads to change in the input distribution to internal layers of the deep network and is known as internal covariate shift.
It is well established that networks converge faster if the inputs have been whitened (ie zero mean, unit variances) and are uncorrelated and internal covariate shift leads to just the opposite.

  
## moving_mnist.py
from PIL import Image
import sys
import os
import math
import numpy as np

###########################################################################################
# script to generate moving mnist video dataset (frame by frame) as described in
# [1] arXiv:1502.04681 - Unsupervised Learning of Video Representations Using LSTMs
#     Srivastava et al

## raspberry_pi_optimization.md

      
              1 file
            
          
              5 forks
            
          
              5 comments
            
          
              42 stars
            
          
                cybear
                / raspberry_pi_optimization.md
            
            
              Last active
              January 27, 2023 22:17
            
              
                I read up a little on performance optimization for the Raspberry Pi, and gathered the links before they disappear from my short term memory. 
              
          
    Raspberry Pi general optimization


Use a class 10 SD card for best speed.
The USB bus can't come much higher than 30MB/s so you don't have to buy
any extremely fast ones though. Not all cards are compatible, check the compatibility list: http://elinux.org/RPi_SD_cards
Use the HardFloat version of Raspbian instead of the SoftFloat. HF has much faster floating point operations - however SF is required for running Java. So it's either Java or performance, like normal.
The official Raspbian image gives low network speeds: http://elinux.org/RPi_Performance#NIC
A graphics driver by Simon / teh_orph is using hardware acceleration for some instructions:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=63&t=28294 installation instructions: http://elinux.org/RPi_Xorg_rpi_Driver
The firmware can be upgraded which gives, among other things, better GPU performance.


## latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
	record_action_trails
	start_phone_number_auth
	call_phone_number_auth
	resend_phone_number_auth
	complete_phone_number_auth
	check_waitlist_status
	get_release_notes
	get_all_topics
	get_topic
	get_clubs_for_topic
	import matplotlib.pyplot as plt
	from matplotlib import animation
	from IPython.display import display, HTML
	import numpy as np

	def plot_sequence_images(image_array):
	''' Display images sequence as an animation in jupyter notebook

	Args:
	image_array(numpy.ndarray): image_array.shape equal to (num_images, height, width, num_channels)
	"""
	Train a neural network to implement the discrete Fourier transform
	"""
	from tensorflow.keras.models import Sequential
	from tensorflow.keras.layers import Dense
	import numpy as np
	import matplotlib.pyplot as plt

	N = 32
	batch = 10000
	(Dijkstra and plain A* are generally not included here as there are thousands of
	implementations, though I've made an exception for rare Ruby and Crystal versions,
	and for Thor, Mapzen's enhanced A*. )

	A* Ruby https://github.com/georgian-se/shortest-path
	A* Crystal https://github.com/petoem/a-star.cr
	A* (bidirectional with shortcuts) C++ https://github.com/valhalla/valhalla
	NBA* JS https://github.com/anvaka/ngraph.path
	NBA* Java https://github.com/coderodde/GraphSearchPal
	NBA* Java https://github.com/coderodde/FunkyPathfinding
	from PIL import Image
	import sys
	import os
	import math
	import numpy as np

	###########################################################################################
	# script to generate moving mnist video dataset (frame by frame) as described in
	# [1] arXiv:1502.04681 - Unsupervised Learning of Video Representations Using LSTMs
	# Srivastava et al
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD