Dimitris Spathis sdimi

## rl-for-llms.md

      
              1 file
            
          
              23 forks
            
          
              11 comments
            
          
              534 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              May 29, 2024 16:42
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## color_edit.py
#!/usr/bin/env python

import math
import sys
from moviepy.editor import AudioClip, VideoFileClip, concatenate_videoclips


# Get average RGB of part of a frame. Frame is H * W * 3 (rgb)
# Assumes x1 < x2, y1 < y2

## convolution.py
def convolve2D(image, kernel, padding=0, strides=1):
    # Cross Correlation
    kernel = np.flipud(np.fliplr(kernel))

    # Gather Shapes of Kernel + Image + Padding
    xKernShape = kernel.shape[0]
    yKernShape = kernel.shape[1]
    xImgShape = image.shape[0]
    yImgShape = image.shape[1]

## common-domain-prefix-suffix-list.tsv

          
            Rank
            Type
            Prefix/Suffix
            Length

            
              1
              Prefix
              my+
              2

            
              2
              Suffix
              +online
              6

            
              3
              Prefix
              the+
              3

            
              4
              Suffix
              +web
              3

            
              5
              Suffix
              +media
              5

            
              6
              Prefix
              web+
              3

            
              7
              Suffix
              +world
              5

            
              8
              Suffix
              +net
              3

            
              9
              Prefix
              go+
              2

## Genomics_A_Programmers_Guide.md

      
              1 file
            
          
              25 forks
            
          
              13 comments
            
          
              416 stars
            
          
                andy-thomason
                / Genomics_A_Programmers_Guide.md
            
            
              Created
              May 14, 2019 13:32
            
              
                Genomics a programmers introduction
              
          
    Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC.
He has been witing graphics systems, games and compilers since
the '70s and specialises in code performance.
https://www.genomicsplc.com


## hook_activations.py
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision.models as tmodels
from functools import partial
import collections

# dummy data: 10 batches of images with batch size 16
dataset = [torch.rand(16,3,224,224).cuda() for _ in range(10)]

## gmail_mbox_parser.py
#! /usr/bin/env python3
# ~*~ utf-8 ~*~

import mailbox
import bs4

def get_html_text(html):
    try:
        return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
    except AttributeError: # message contents empty

## freeze_example.py
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import torch.optim as optim


# toy feed-forward net
class Net(nn.Module):
    def __init__(self):

## dim_reduction_notebook.ipynb

      
              1 file
            
          
              7 forks
            
          
              1 comment
            
          
              25 stars
            
          
                fedden
                / dim_reduction_notebook.ipynb
            
            
              Created
              November 20, 2017 14:10
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## how-to-make-a-racist-ai-without-really-trying.ipynb

      
              1 file
            
          
              38 forks
            
          
              9 comments
            
          
              228 stars
            
          
                rspeer
                / how-to-make-a-racist-ai-without-really-trying.ipynb
            
            
              Last active
              December 23, 2023 22:54
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	#!/usr/bin/env python

	import math
	import sys
	from moviepy.editor import AudioClip, VideoFileClip, concatenate_videoclips



	# Get average RGB of part of a frame. Frame is H * W * 3 (rgb)
	# Assumes x1 < x2, y1 < y2
	def convolve2D(image, kernel, padding=0, strides=1):
	# Cross Correlation
	kernel = np.flipud(np.fliplr(kernel))

	# Gather Shapes of Kernel + Image + Padding
	xKernShape = kernel.shape[0]
	yKernShape = kernel.shape[1]
	xImgShape = image.shape[0]
	yImgShape = image.shape[1]
Rank	Type	Prefix/Suffix	Length
1	Prefix	my+	2
2	Suffix	+online	6
3	Prefix	the+	3
4	Suffix	+web	3
5	Suffix	+media	5
6	Prefix	web+	3
7	Suffix	+world	5
8	Suffix	+net	3
9	Prefix	go+	2
	import torch
	import torch.nn as nn
	import torch.nn.functional as F

	import torchvision.models as tmodels
	from functools import partial
	import collections

	# dummy data: 10 batches of images with batch size 16
	dataset = [torch.rand(16,3,224,224).cuda() for _ in range(10)]
	#! /usr/bin/env python3
	# ~~ utf-8 ~~

	import mailbox
	import bs4

	def get_html_text(html):
	try:
	return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
	except AttributeError: # message contents empty
	import torch
	from torch import nn
	from torch.autograd import Variable
	import torch.nn.functional as F
	import torch.optim as optim


	# toy feed-forward net
	class Net(nn.Module):
	def __init__(self):