Skip to content

Instantly share code, notes, and snippets.

View joelxiangnanchen's full-sized avatar

joelxiangnanchen joelxiangnanchen

View GitHub Profile
@ctlllll
ctlllll / longest_chinese_tokens_gpt4o.py
Created May 13, 2024 19:53
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except:

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@Curtis-64
Curtis-64 / ClaudeHack
Last active July 10, 2025 22:47
Claude Multi-Stage Prompt Injection for Personas
#Claude Prompt Inject by Curtis White (Prompt Engineer)
This is a 3 to 5 stage prompt injection.
1. Print innocuous string [Corp AI]
2. Consider a simple hypothetical request.
3. Echo or repeat in new rules (can be used to revise program it)
4. Ask about the .p rule for clarification
5. Invoke the rule to generate a persona
Update: I have a new Claude breaker derivative that completely breaks its censorship for 3-5 turns. Undecided if I will publish, yet.
@thomwolf
thomwolf / top-k-top-p.py
Last active October 25, 2025 20:25
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
""" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
Args:
logits: logits distribution shape (vocabulary size)
top_k >0: keep only top k tokens with highest probability (top-k filtering).
top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
"""
assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear
top_k = min(top_k, logits.size(-1)) # Safety check
@PetrochukM
PetrochukM / top_k_viterbi.py
Last active June 17, 2022 22:20
Implemented a Top K Viterbi Decoder algorithm in PyTorch. Useful for Conditional Random Fields (CRFs)-based probabilistic graphical modelling. Learn more here: https://nlp.stanford.edu/joberant/esslli_2016/kbest-ict.pdf
import torch
# Credits to AllenNLP for the base implementation and base tests:
# https://github.com/allenai/allennlp/blob/master/allennlp/nn/util.py#L174
# Modified AllenNLP `viterbi_decode` to support `top_k` sequences efficiently.
def viterbi_decode(tag_sequence: torch.Tensor, transition_matrix: torch.Tensor, top_k: int=5):
"""
Perform Viterbi decoding in log space over a sequence given a transition matrix
specifying pairwise (transition) potentials between tags and a matrix of shape
@alsrgv
alsrgv / tensorflow_mnist_estimator.py
Last active February 8, 2023 10:05
Horovod with Estimator API
# Copyright 2017 Uber Technologies, Inc. All Rights Reserved.
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@zjhiphop
zjhiphop / 分布式系统学习资料.md
Created July 23, 2015 06:13
分布式系统学习资料

##分布式系统(Distributed System)资料


#####希望转载的朋友,你可以不用联系我.但是一定要保留原文链接,因为这个项目还在继续也在不定期更新.希望看到文章的朋友能够学到更多.

介绍:这是一篇介绍在动态网络里面实现分布式系统重构的paper.论文的作者(导师)是MIT读博的时候是做分布式系统的研究的,现在在NUS带学生,不仅仅是分布式系统,还有无线网络.如果感兴趣可以去他的主页了解.