Robin rdoume

## script.js

/*
the twitter api is stupid. it is stupid and bad and expensive. hence, this.

Literally just paste this in the JS console on the bookmarks tab and the script will automatically scroll to the bottom of your bookmarks and keep a track of them as it goes.

When finished, it downloads a JSON file containing the raw text content of every bookmark.

for now it stores just the text inside the tweet itself, but if you're reading this why don't you go ahead and try to also store other information (author, tweetLink, pictures, everything). come on. do it. please?
*/

## normcore-llm.md

      
              1 file
            
          
              216 forks
            
          
              38 comments
            
          
              2765 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              June 29, 2024 03:29
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## bq_job_editions_cost_comparison_with_autoscaler.sql
/*
 *  This query will look at the past 30 days of job history to analyze it for costs under
 *  BigQuery Editions while utilizing the new autoscaling feature that was introduced.
 *  It does this for those using both PAYG (Pay As You Go) and commitment models.
 *  It will also compare this versus running the query with the on-demand model.
 *
 *  Note that this query utilizes some math modeling behaviors that the BigQuery
 *  autoscaler uses. Namely these are the up to 10 seconds "slot scale up time,"
 *  the minimum of 60 seconds "slot scale down time," and the behavior that the
 *  autoscaler scales up and down in factors of 100 slots for each job.

## bq_storage_across_org.sql
/*
 *  This query will run across an entire organization looking at tables across every project
 *  and shows how they will compare on compressed and uncompressed storage.
 *
 *  Region Notes:
 *  This query will only read from a single region or multi-region at a time. It's
 *  currently not possible to read this data from across all
 *
 *  By default this reads from the US multi-region, so this might need to be changed if
 *  your data lives elsewhere.

## rl-for-llms.md

      
              1 file
            
          
              23 forks
            
          
              11 comments
            
          
              538 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              June 28, 2024 08:06
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## multiple_changepoint_marginalization.ipynb

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              10 stars
            
          
                ricardoV94
                / multiple_changepoint_marginalization.ipynb
            
            
              Last active
              March 19, 2023 10:39
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## kaplan_meier_for_revenue.py
from matplotlib import pyplot
import random
import time

pyplot.style.use("ggplot")
now = time.time()

def generate_user(censor=now):
    # Pick some point in time the user was created
    t_created = t = now - random.random() * 1e7

## softie.py
from sklearn.base import BaseEstimator, ClassifierMixin
from scipy.special import expit, logit


class SoftLabelClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, regressor, eps=0.001):
        self.regressor = regressor
        self.eps = eps

    def fit(self, X, y=None):

## commit_jupyter_notebooks_code_to_git_and_keep_output_locally.md

      
              1 file
            
          
              2 forks
            
          
              19 comments
            
          
              134 stars
            
          
                33eyes
                / commit_jupyter_notebooks_code_to_git_and_keep_output_locally.md
            
            
              Last active
              June 29, 2024 08:08
            
              
                How to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally
              
          
    Commit jupyter notebooks code to git and keep output locally


Add a filter to git config by running the following command in bash inside the repo:

git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  


Create a .gitattributes file inside the directory with the notebooks


Add the following to that file:


## main.py
import uuid
import json
import random

import keras
import numpy as np
import tensorflow as tf
import click

	/*
	the twitter api is stupid. it is stupid and bad and expensive. hence, this.

	Literally just paste this in the JS console on the bookmarks tab and the script will automatically scroll to the bottom of your bookmarks and keep a track of them as it goes.

	When finished, it downloads a JSON file containing the raw text content of every bookmark.

	for now it stores just the text inside the tweet itself, but if you're reading this why don't you go ahead and try to also store other information (author, tweetLink, pictures, everything). come on. do it. please?
	*/
	/*
	* This query will look at the past 30 days of job history to analyze it for costs under
	* BigQuery Editions while utilizing the new autoscaling feature that was introduced.
	* It does this for those using both PAYG (Pay As You Go) and commitment models.
	* It will also compare this versus running the query with the on-demand model.
	*
	* Note that this query utilizes some math modeling behaviors that the BigQuery
	* autoscaler uses. Namely these are the up to 10 seconds "slot scale up time,"
	* the minimum of 60 seconds "slot scale down time," and the behavior that the
	* autoscaler scales up and down in factors of 100 slots for each job.
	/*
	* This query will run across an entire organization looking at tables across every project
	* and shows how they will compare on compressed and uncompressed storage.
	*
	* Region Notes:
	* This query will only read from a single region or multi-region at a time. It's
	* currently not possible to read this data from across all
	*
	* By default this reads from the US multi-region, so this might need to be changed if
	* your data lives elsewhere.
	from matplotlib import pyplot
	import random
	import time

	pyplot.style.use("ggplot")
	now = time.time()

	def generate_user(censor=now):
	# Pick some point in time the user was created
	t_created = t = now - random.random() * 1e7
	from sklearn.base import BaseEstimator, ClassifierMixin
	from scipy.special import expit, logit


	class SoftLabelClassifier(BaseEstimator, ClassifierMixin):
	def __init__(self, regressor, eps=0.001):
	self.regressor = regressor
	self.eps = eps

	def fit(self, X, y=None):
	import uuid
	import json
	import random

	import keras
	import numpy as np
	import tensorflow as tf
	import click