Skip to content

Instantly share code, notes, and snippets.

@JoaoLages
JoaoLages / RLHF.md
Last active July 26, 2024 01:10
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
@jason5ng32
jason5ng32 / surge.conf
Last active April 7, 2024 13:04
Surge Configs ( for 2.x )
[General]
loglevel = notify
skip-proxy = 127.0.0.1, 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12, 100.64.0.0/10, localhost, *.local, ::ffff:0:0:0:0/1, ::ffff:128:0:0:0/1
bypass-tun = 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12
# dns-server = 119.29.29.29,223.5.5.5,114.114.115.115
# external-controller-access = PASSWORD@0.0.0.0:6155
# ipv6 = true
// REMEMBER TO CHANGE THE external-controller-access' PASSWORD
@zabirauf
zabirauf / ROP.ex
Created March 26, 2015 07:48
Railway Oriented Programming macros in Elixir
defmodule ROP do
defmacro try_catch(args, func) do
quote do
(fn ->
try do
unquote(args) |> unquote(func)
rescue
e -> {:error, e}
end
import hashlib, zlib
HASH_BLOCKSIZE = 65536
def filebaidumd5(f, size=262144):
h1 = hashlib.md5()
h2 = 0
pos = 0
buf = f.read(min(HASH_BLOCKSIZE, size))
md5_at_size = None
while buf:
pos += len(buf)
@erikreagan
erikreagan / mac-apps.md
Created August 4, 2012 19:18
Mac developer must-haves

Mac web developer apps

This gist's comment stream is a collection of webdev apps for OS X. Feel free to add links to apps you like, just make sure you add some context to what it does — either from the creator's website or your own thoughts.

— Erik

@wendal
wendal / MethodParamNamesScaner.java
Created March 10, 2012 15:15
获得方法形参名称列表(Java)
package org.nutz.lang.util;
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.HashMap;
@rednaxelafx
rednaxelafx / Utils.java
Created November 23, 2011 06:50
Get heap histogram from within a Java program itself
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import com.sun.tools.attach.AttachNotSupportedException;
import com.sun.tools.attach.VirtualMachine;
import sun.tools.attach.HotSpotVirtualMachine;
@stephenwilley
stephenwilley / async_process.py
Created November 16, 2011 16:24 — forked from pplante/tornado-async-process-mixin.py
Module type version of original with gen example
# Adapted from here: https://gist.github.com/489093
# Original credit goes to pplante and copyright notice pasted below
# Copyright (c) 2010, Philip Plante of EndlessPaths.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
@rednaxelafx
rednaxelafx / PrintThreadIds.java
Created February 25, 2011 10:31
find out the correspondence between the tid/nid of Java threads as shown from jstack/JMX, on HotSpot/Linux
package fx.jvm.hotspot.tools;
import java.util.List;
import sun.jvm.hotspot.tools.Tool;
public class PrintThreadIds extends Tool {
public static void main(String[] args) {
PrintThreadIds tool = new PrintThreadIds();
tool.start(args);
#!/usr/bin/env python
# Usage:
# This script will generate two files(vpnup and vpndown) after executing.
# Do chmod a+x on the two newly created files, and then move them to the
# openvpn config folder. then add the following two lines to the vpn config file:
# up vpnup
# down vpndown
# you might also need 'redirect-gateway' in the config file, if you don't use vpn
# as the default gateway.