Skip to content

Instantly share code, notes, and snippets.

Training open-source LLMs on ChatGPT output is a really bad idea.

Everyone is now racing to create open-source alternatives to compete with GPT3.5/GPT4. A common shortcut used by some teams to bootstrap their effort is to fine-tune their model on ChatGPT output. I used to think it was a good idea and totally fair play to do this. Actually, I still think it’s fair play. OpenAI effectively distilled the entire web into its models. They are saying themself that they are using publicly accessible information (mostly). So distilling their model is, in effect, distilling the public open web, so small Term of Service details aside, I don’t see major ethical problems with that. Right? Well, it’s not entirely true and I realized now that, even when ignoring the ethical considerations, using their output is a really bad idea.

First of all, from a purely technical point of view, as @yoavgo is explaining it beautifully in his recent post, there is no way to align LLMs correctly without the RLHF component. I encourag

@CGArtPython
CGArtPython / BP_triangles_p3_i2.py
Last active January 19, 2023 06:27
Blender 3D Python Tutorial "Blender+Python: Make some triangles, Part 3 Idea #2"
import math
import random
import bpy
import bmesh
def clean_scene():
"""
Remove all objects from the scene
@sleepyfox
sleepyfox / 2019-07-25-users-hate-change.md
Last active October 25, 2025 18:39
'Users hate change'

'Users hate change'

This week NN Group released a video by Jakob Nielsen in which he attempts to help designers deal with the problem of customers being resistant to their new site/product redesign. The argument goes thusly:

  1. Humans naturally resist change
  2. Your change is for the better
  3. Customers should just get used to it and stop complaining

There's slightly more to it than that, he caveats his argument with requiring you to have of course followed their best practices on product design, and allows for a period of customers being able to elect to continue to use the old site, although he says this is obviously only a temporary solution as you don't want to support both.

@bengotow
bengotow / linkify-plugin.js
Last active December 23, 2022 15:33
A Linkify plugin for DraftJS that creates / syncs entities on the fly
import React from 'react';
import { RichUtils, Modifier, EditorState, SelectionState } from 'draft-js';
function isURL(text) {
return text.startsWith('http://'); // insert your favorite library here
}
/*
Function you can call from your toolbar or "link button" to manually linkify
the selected text with an "explicit" flag that prevents autolinking from
changing the URL if the user changes the link text.
@primaryobjects
primaryobjects / m3u8.md
Last active October 29, 2025 22:03
How to download m3u8 and ts video movie streams.

m3u8 Downloading

  1. Open Chrome Developer tools and click the Network tab.
  2. Navigate to the page with the video and get it to start playing.
  3. Filter the list of files to "m3u8".
  4. Find master.m3u8 or index.m3u8 and click on it.
  5. Save the file to disk and look inside it.
  6. If the file contains a single m3u8 master url, copy that one instead.
  7. Run the program m3u8x.
  8. Paste the same m3u8 url in both textboxes (URL and Quality URL) and click "Headers" and set the referral url and user-agent from the request as found in Chrome.
@golanlevin
golanlevin / noiseloop.pde
Last active October 19, 2020 13:12
Processing code to demonstrate seamless loop of 1D noise
// Processing 3.0x code to demonstrate seamless loop of 1D noise
// Inspired by, and created in support of:
// "Drawing from noise, and then making animated loopy GIFs from there" by Etienne Jacob (@n_disorder)
// https://necessarydisorder.wordpress.com/2017/11/15/drawing-from-noise-and-then-making-animated-loopy-gifs-from-there/
// Note: this program has no dependencies, and does not require SimplexNoise.
// Demo GIF: https://media.giphy.com/media/xUOxeU2ELSPeTbevle/giphy.gif or http://gph.is/2Ah5kqG
PGraphics offscreenImg;
float myScale = 0.01;
float radius = 100.0;
@mizchi
mizchi / webpack.config.js
Last active December 6, 2019 21:20
minimum webpack with babel
/*
yarn init -y
yarn add webpack webpack-cli webpack-serve html-webpack-plugin -D
yarn add babel-loader@^8.0.0-beta @babel/core @babel/preset-env -D
echo '{ "presets": ["@babel/preset-env"] }' > .babelrc
*/
const HtmlPlugin = require("html-webpack-plugin");
module.exports = {
mode: process.env.NODE_ENV || "development",
@jonataswalker
jonataswalker / electron-sqlite3.md
Created January 20, 2017 11:53 — forked from craigvantonder/electron-sqlite3.md
Electron SQLite3 Integration

Electron SQLite3 Integration

When trying to use the node-sqlite3 module in Electron I got the error:

Error: Cannot find module '/path/to/my/application/node_modules/sqlite3/lib/binding/electron-v1.4-linux-x64/node_sqlite3.node'

Using Ubuntu 16.04 with Node 7.1.0 and Electron 1.4.12.

I read the following:

@yossorion
yossorion / what-i-wish-id-known-about-equity-before-joining-a-unicorn.md
Last active September 4, 2025 01:33
What I Wish I'd Known About Equity Before Joining A Unicorn

What I Wish I'd Known About Equity Before Joining A Unicorn

Disclaimer: This piece is written anonymously. The names of a few particular companies are mentioned, but as common examples only.

This is a short write-up on things that I wish I'd known and considered before joining a private company (aka startup, aka unicorn in some cases). I'm not trying to make the case that you should never join a private company, but the power imbalance between founder and employee is extreme, and that potential candidates would

@josephmisiti
josephmisiti / helloevolve.py
Created November 11, 2016 18:19
helloevolve.py - a simple genetic algorithm in Python
"""
helloevolve.py implements a genetic algorithm that starts with a base
population of randomly generated strings, iterates over a certain number of
generations while implementing 'natural selection', and prints out the most fit
string.
The parameters of the simulation can be changed by modifying one of the many
global variables. To change the "most fit" string, modify OPTIMAL. POP_SIZE
controls the size of each generation, and GENERATIONS is the amount of
generations that the simulation will loop through before returning the fittest