Bruno Sánchez-Andrade Nuño brunosan

## finetune-workshop-clay (1).ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                brunosan
                / finetune-workshop-clay (1).ipynb
            
            
              Created
              May 21, 2024 22:36
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## animated.sh
#/bin/bash

rm -rf a_temp_images
mkdir -p a_temp_images
i=1
for f in $(ls Images_*.png | sort -V ); do
  ln -s "$PWD/$f" "a_temp_images/$(printf "%05d.png" $i)"
  i=$((i + 1))
done

## README.md

      
              10 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                brunosan
                / README.md
            
            
              Last active
              December 26, 2023 12:21
            
              
                All files used to train Clay model `v0`, using the sampling strategy `v2`. 
              
          
    All files used to train Clay model v0, using the sampling strategy v2.
The full .txt file lists 1045224 files (and summary at the end), and is 88Mb, so I've split them to fit on a gist in 10mb chunks.
Generated using:
aws s3 ls s3://clay-tiles-02/ --recursive --human-readable --no-paginate --summarize > s3_listing_data_v2_model_v0.txt
split -b 10m s3_listing_data_v2_model_v0.txt -d s3_listing_data_v2_model_v0-part- --additional-suffix=.txt
gh gist create s3_listing_data_v2_model_v0-part-0* -p

  
## asturias.geojson

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                brunosan
                / asturias.geojson
            
            
              Created
              October 9, 2023 11:36
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## summarize_transcript.txt
import openai

def recursive_summarization(text, chunk_size=32000, summary_length=20000, final_length=2000):
    openai.api_key = "ADD YOURS"

    print("Starting summarization...")

    if len(text) <= final_length:
        print("Text is short enough. No summarization needed.")
        return text

## gist.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                brunosan
                / gist.md
            
            
              Created
              September 8, 2023 12:35
            
              
                What is a Transformer in AI Deep Learning
              
          
Conceptual explanation of the vanilla Transformer.

Overall, it's a pipeline that takes a sequence of items represented by learnable vectors, and then create parallel attempts ("heads") to understand how each sequence item relates to the others ("self-attention"), to create an output sequence. There can be many layers of heads, making a transformer very deep. Sometimes the task is to purely predict the next item, and sometimes is to create a version of the list, e.g. translate languages, summarize, ...
There are extensions like [[Vision Transformer ViT]] but these are mostly wrappers to convert image patches into digestible tokens for the same architecture.

The input (e.g. a text snippet) of length L is tokenized. I.e. every token (word, or subword) assigned the number of that token in the dictionary set. (e.g. {sun}=0,{moon}=1, ...). There are special tokens to indicate things like "start of sequence", "new line", ... Additionally,, all uncommon tokens are replaced with the same extra token.
Each


## Impact-Science_book_v8.9.txt

“Bruno is one of the most creative and thoughtful scientists that I know. In this work, he examines how to use science and technology to make this world a better place—critically important on a changing and increasingly complex planet.”
Dr. Ellen Stofan
Former NASA Chief Scientist and
    Director of the Smithsonian National Air and Space Museum

“If you liked ‘Freakonomics,’ you will love this one!"
Najat Vallaud-Belkacem
Former France Minister of Education and Research

## openai-embeddings.py
import os
import openai
import requests
import tiktoken
from typing import List
import numpy as np
import matplotlib.pyplot as plt


# Set your OpenAI API key from the environment variable

## pc-pakistan-floods.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              6 stars
            
          
                brunosan
                / pc-pakistan-floods.ipynb
            
            
              Last active
              July 20, 2024 18:48
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## README.md

      
              7 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                brunosan
                / README.md
            
            
              Last active
              January 3, 2023 10:19
            
              
                World Bank Official Boundaries
              
          
    World Bank Official Boundaries

Sourced from https://datacatalog.worldbank.org/search/dataset/0038272
on Jan 1, 2023
License: License: Creative Commons Attribution 4.0
	#/bin/bash

	rm -rf a_temp_images
	mkdir -p a_temp_images
	i=1
	for f in $(ls Images_*.png \| sort -V ); do
	ln -s "$PWD/$f" "a_temp_images/$(printf "%05d.png" $i)"
	i=$((i + 1))
	done
	import openai

	def recursive_summarization(text, chunk_size=32000, summary_length=20000, final_length=2000):
	openai.api_key = "ADD YOURS"

	print("Starting summarization...")

	if len(text) <= final_length:
	print("Text is short enough. No summarization needed.")
	return text

	“Bruno is one of the most creative and thoughtful scientists that I know. In this work, he examines how to use science and technology to make this world a better place—critically important on a changing and increasingly complex planet.”
	Dr. Ellen Stofan
	Former NASA Chief Scientist and
	Director of the Smithsonian National Air and Space Museum

	“If you liked ‘Freakonomics,’ you will love this one!"
	Najat Vallaud-Belkacem
	Former France Minister of Education and Research
	import os
	import openai
	import requests
	import tiktoken
	from typing import List
	import numpy as np
	import matplotlib.pyplot as plt


	# Set your OpenAI API key from the environment variable