Skip to content

Instantly share code, notes, and snippets.

View brunosan's full-sized avatar

Bruno Sánchez-Andrade Nuño brunosan

View GitHub Profile
@brunosan
brunosan / animated.sh
Created February 18, 2024 01:30
Create a properly ordered animated video from frames
#/bin/bash
rm -rf a_temp_images
mkdir -p a_temp_images
i=1
for f in $(ls Images_*.png | sort -V ); do
ln -s "$PWD/$f" "a_temp_images/$(printf "%05d.png" $i)"
i=$((i + 1))
done
@brunosan
brunosan / README.md
Last active December 26, 2023 12:21
All files used to train Clay model `v0`, using the sampling strategy `v2`.

All files used to train Clay model v0, using the sampling strategy v2.

The full .txt file lists 1045224 files (and summary at the end), and is 88Mb, so I've split them to fit on a gist in 10mb chunks.

Generated using:

aws s3 ls s3://clay-tiles-02/ --recursive --human-readable --no-paginate --summarize > s3_listing_data_v2_model_v0.txt
split -b 10m s3_listing_data_v2_model_v0.txt -d s3_listing_data_v2_model_v0-part- --additional-suffix=.txt
gh gist create s3_listing_data_v2_model_v0-part-0* -p
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@brunosan
brunosan / summarize_transcript.txt
Created September 12, 2023 12:43
Summary a long transcript using OpenAI API
import openai
def recursive_summarization(text, chunk_size=32000, summary_length=20000, final_length=2000):
openai.api_key = "ADD YOURS"
print("Starting summarization...")
if len(text) <= final_length:
print("Text is short enough. No summarization needed.")
return text
@brunosan
brunosan / gist.md
Created September 8, 2023 12:35
What is a Transformer in AI Deep Learning

Conceptual explanation of the vanilla Transformer.

Overall, it's a pipeline that takes a sequence of items represented by learnable vectors, and then create parallel attempts ("heads") to understand how each sequence item relates to the others ("self-attention"), to create an output sequence. There can be many layers of heads, making a transformer very deep. Sometimes the task is to purely predict the next item, and sometimes is to create a version of the list, e.g. translate languages, summarize, ...

There are extensions like [[Vision Transformer ViT]] but these are mostly wrappers to convert image patches into digestible tokens for the same architecture.

  1. The input (e.g. a text snippet) of length L is tokenized. I.e. every token (word, or subword) assigned the number of that token in the dictionary set. (e.g. {sun}=0,{moon}=1, ...). There are special tokens to indicate things like "start of sequence", "new line", ... Additionally,, all uncommon tokens are replaced with the same extra token.
  2. Each
“Bruno is one of the most creative and thoughtful scientists that I know. In this work, he examines how to use science and technology to make this world a better place—critically important on a changing and increasingly complex planet.”
Dr. Ellen Stofan
Former NASA Chief Scientist and
Director of the Smithsonian National Air and Space Museum
“If you liked ‘Freakonomics,’ you will love this one!"
Najat Vallaud-Belkacem
Former France Minister of Education and Research
import os
import openai
import requests
import tiktoken
from typing import List
import numpy as np
import matplotlib.pyplot as plt
# Set your OpenAI API key from the environment variable
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@brunosan
brunosan / README.md
Last active January 3, 2023 10:19
World Bank Official Boundaries
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.