Yuan zhouyuan

## scale_mm_example.py
import torch
import torch.nn.functional as F

def to_float8(x, dtype=torch.float8_e4m3fn):
    finfo = torch.finfo(dtype)
    # Calculate the scale as dtype max divided by absmax
    scale = finfo.max / x.abs().max().clamp(min=1e-12)
    # scale and clamp the tensor to bring it to
    # the representative range of float8 data type
    # (as default cast is unsaturated)

## 1-pw_op_fusion.py
import torch
import torch._inductor.config
import time

torch._inductor.config.triton.cudagraphs = False
torch.set_float32_matmul_precision('high')

def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
    for _ in range(warmup):
        f()

## A_View_To_a_Thing.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              2 stars
            
          
                lefticus
                / A_View_To_a_Thing.md
            
            
              Last active
              August 16, 2023 18:48
            
          
    A View to a Thing


Jason Turner


Host of C++ Weekly
Co-host of CppCast
Speaker / Contractor / Trainer


## pytorch-lbfgs-example.py
import torch
import torch.optim as optim
import matplotlib.pyplot as plt


# 2d Rosenbrock function
def f(x):
    return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2


## jacobian_hessian.py
import torch

def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)
    grad_y = torch.zeros_like(flat_y)
    for i in range(len(flat_y)):
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))

## avx_sigh.md

      
              1 file
            
          
              3 forks
            
          
              0 comments
            
          
              66 stars
            
          
                rygorous
                / avx_sigh.md
            
            
              Last active
              June 1, 2024 02:05
            
          
why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.
Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.
[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]
The other issue is to do with CPU power management.

  
## Rop.py
def Rop(y, x, v):
  """Computes an Rop.

  Arguments:
    y (Variable): output of differentiated function
    x (Variable): differentiated input
    v (Variable): vector to be multiplied with Jacobian from the right
  """
  w = torch.ones_like(y, requires_grad=True)
  return torch.autograd.grad(torch.autograd.grad(y, x, w), w, v)

## gen.cpp
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <assert.h>
#include <stdint.h>
#include <algorithm>
#include <vector>

//======================== Perfect hash generator =========================


## internals.md

      
              1 file
            
          
              11 forks
            
          
              2 comments
            
          
              122 stars
            
          
                killeent
                / internals.md
            
            
              Last active
              February 14, 2023 05:15
            
          
    A Tour of PyTorch Internals (Part I)

The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:

How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
How does PyTorch wrap the C libraries that actually define the Tensor's properties and methods?
How does PyTorch cwrap work to generate code for Tensor methods?
How does PyTorch's build system take all of these components to compile and generate a workable application?

Extending the Python Interpreter

PyTorch defines a new package torch. In this post we will consider the ._C module. This module is known as an "extension module" - a Python module written in C. Such modules allow us to define new built-in object types (e.g. the Tensor) and to call C/C++ functions.

  
## listpack.md

      
              1 file
            
          
              0 forks
            
          
              3 comments
            
          
              18 stars
            
          
                antirez
                / listpack.md
            
            
              Last active
              April 10, 2023 18:45
            
          
    Listpack specification

Version 1.0, 1 Feb 2017: Intial specification.

Version 1.1, 2 Feb 2017: Integer encoding simplified. Appendix A added.

Version 1.2, 3 Feb 2017: Better specify the meaning of the num-elements
                         field with value of 65535. The two 12 bits

positive/negative integers encodings were
	import torch
	import torch.nn.functional as F

	def to_float8(x, dtype=torch.float8_e4m3fn):
	finfo = torch.finfo(dtype)
	# Calculate the scale as dtype max divided by absmax
	scale = finfo.max / x.abs().max().clamp(min=1e-12)
	# scale and clamp the tensor to bring it to
	# the representative range of float8 data type
	# (as default cast is unsaturated)
	import torch
	import torch._inductor.config
	import time

	torch._inductor.config.triton.cudagraphs = False
	torch.set_float32_matmul_precision('high')

	def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
	for _ in range(warmup):
	f()
	import torch
	import torch.optim as optim
	import matplotlib.pyplot as plt


	# 2d Rosenbrock function
	def f(x):
	return (1 - x[0])*2 + 100 (x[1] - x[0]2)2
	import torch

	def jacobian(y, x, create_graph=False):
	jac = []
	flat_y = y.reshape(-1)
	grad_y = torch.zeros_like(flat_y)
	for i in range(len(flat_y)):
	grad_y[i] = 1.
	grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
	jac.append(grad_x.reshape(x.shape))
	def Rop(y, x, v):
	"""Computes an Rop.

	Arguments:
	y (Variable): output of differentiated function
	x (Variable): differentiated input
	v (Variable): vector to be multiplied with Jacobian from the right
	"""
	w = torch.ones_like(y, requires_grad=True)
	return torch.autograd.grad(torch.autograd.grad(y, x, w), w, v)
	#include <stdio.h>
	#include <time.h>
	#include <string.h>
	#include <assert.h>
	#include <stdint.h>
	#include <algorithm>
	#include <vector>

	//======================== Perfect hash generator =========================