Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
pszemraj /
Created July 23, 2024 04:19
print out a summary of a pytorch model
from typing import List, Tuple, Optional, Set
import torch.nn as nn
from transformers import PreTrainedModel
def model_summary(
model: PreTrainedModel, max_depth: int = 4, show_input_size: bool = False
) -> None:
Prints an accurate summary of the model, avoiding double-counting of parameters.
pszemraj /
Created July 13, 2024 22:07
simple fn using regex to check if a string url points to an image file
import re
# List of common image file extensions
image_extensions = [
'jpg', 'jpeg', 'jpe', 'jif', 'jfif', 'jfi', # JPEG
'png', # PNG
'gif', # GIF
'webp', # WebP
'tiff', 'tif', # TIFF
'bmp', # BMP
pszemraj /
Created June 6, 2024 03:35
Technical Overview and Explanation of "Scalable MatMul-free Language Modeling" by gpt-4o

Technical Overview and Explanation of "Scalable MatMul-free Language Modeling"


This paper presents a novel approach to large language models (LLMs) that eliminates matrix multiplication (MatMul) operations, which are typically the most computationally expensive part of such models. By doing so, the authors aim to significantly reduce memory usage and improve computational efficiency, enabling the models to scale up to billions of parameters while maintaining performance comparable to state-of-the-art Transformers.

Key Contributions

  1. MatMul-Free Dense Layers: The core innovation lies in replacing MatMul operations in dense layers with addition operations using ternary weights. These ternary weights take values from {-1, 0, +1}, which allows matrix multiplications to be transformed into simple additions and subtractions.

We shall also address some key issues related to space optimization, such as the use of microelectronic technologies (e.g., electronic sensors) and computational modeling. Finally, we'll consider another challenge facing the next edition of our series--the sea of underwater buildings.

2 What Is Ocean Architecture?

Ocean architecture refers to how objects move through time. While many ideas have emerged since antiquity, archaeologists believe that the structure of the earth was constructed using mechanical and chemical processes. At one level, there were two basic types of structures - mechanical systems and gravity systems. Metamorphoses may be thought of as building blocks of modern society:

a) The physical organization of each object. An organism moves through its own frame when it moves, but does not move or carry out any actions; or b) The interaction between different parts of the body depends on the environment surrounding it. The process of motion determines how far away the animal lives and wha

pszemraj /
Last active May 19, 2024 18:40
how to use unsloth grad checkpointing


Credit/source: here

how to use unsloth grad checkpointing


To integrate the provided monkey patch for offloading gradient checkpointing into the Hugging Face transformers library, you need to follow these steps:

pszemraj /
Last active May 18, 2024 03:39
this took 7 mins and 2 gb vram. yep 2 gb. the generated text has not been edited in any way, just saved as .md


At the heart of every meme generation lies the concept of what it means to be human. While we do so by providing us with powerful examples and narratives, today's consumers need to understand the nature of their behavior, how they behave, and what happens when these people are harmed. Our goal in creating mesmerizing and exciting stories for our audiences is to share our stories without distractions, and with no limits on how much effort may go into making them. To achieve this goal, we aim to build an interactive narrative of our experience. In addition, we encourage visitors to use their stories to explore ways that the future might look like and apply the lessons learned. With the introduction of Android, there has been significant growth in mobile apps with unprecedented popularity and success. For example, Apple has created a series of popular smartphones with impressive user interface features. These include the Google Play Store, Facebook Live, Twitter, YouTube, and Spotify. From browsin

pszemraj /
Last active July 24, 2024 05:07
bash script for basic testing with pile-t5-large. note that this uses 1024 as the seq length for in/ 512 out
# Set environment variables
export WANDB_PROJECT="pileT5-summ"
export WANDB_WATCH="gradients"
export WANDB_ENTITY="pszemraj"
NUM_WORKERS=$(lscpu -p | egrep -v '^#' | sort -u -t, -k 2,4 | wc -l)
echo "Number of CPU cores: $NUM_WORKERS"
pszemraj /
Created March 30, 2024 03:22
modern relocation research & adjustment courtesy of claude3 opus

Messages Overview - 2024-03-30 04:20:45 - Total Messages: 6

User - Msg No. 1/6

Can you give me an up to date overview of Atlanta and the different areas of the city

Assistant - Msg No. 2/6

Sure, I can provide you with an overview of Atlanta and its different areas. Atlanta is the capital and most populous city in the state of Georgia, with a diverse population and a thriving economy. Here's a breakdown of some of the main areas:

pszemraj /
Created March 27, 2024 12:00
simple CLI for builtin-python archive creation
Creates an archive of a directory
pip install fire
import os
import shutil
from pathlib import Path
pszemraj /
Created March 23, 2024 02:26
find local package meta dependencies
import pkg_resources
def list_dependencies(package_name, level=0, explored=set()):
# Define indent outside of try-except to ensure it's always assigned
indent = " " * level
if package_name in explored: