Skip to content

Instantly share code, notes, and snippets.

View dsdanielpark's full-sized avatar
🏄‍♂️
Believe in your potential. May the Force be with us.

MinWoo(Daniel) Park dsdanielpark

🏄‍♂️
Believe in your potential. May the Force be with us.
View GitHub Profile
@alielfilali01
alielfilali01 / Eval-Arabic-LLMs-using-lighteval.ipynb
Last active April 26, 2024 14:59
Copy of Test-lighteval.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gamingflexer
gamingflexer / main.py
Created July 19, 2023 09:02
Anthropic's tokenizer for Claude
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="/home/ubuntu/LLM/module/claude-v1-tokenization.json")
text = "Hello, this is a test input."
tokens = fast_tokenizer.tokenize(text)
tokens
@rain-1
rain-1 / LLM.md
Last active May 12, 2024 22:10
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

@JoaoLages
JoaoLages / RLHF.md
Last active March 26, 2024 18:51
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
@haven-jeon
haven-jeon / naver_review_classifications_gluon_bert.ipynb
Last active February 25, 2023 08:36
BERT with Naver Sentiment Movie Corpus
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@russss
russss / deskew.py
Created September 10, 2018 12:05
Automatic scanned image rotation/deskew with OpenCV
import cv2
import numpy as np
def deskew(im, max_skew=10):
height, width = im.shape
# Create a grayscale image and denoise it
im_gs = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
im_gs = cv2.fastNlMeansDenoising(im_gs, h=3)
@f0k
f0k / cuda_check.py
Last active May 2, 2024 06:14
Simple python script to obtain CUDA device information
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Outputs some information on CUDA-enabled devices on your computer,
including current memory usage.
It's a port of https://gist.github.com/f0k/0d6431e3faa60bffc788f8b4daa029b1
from C to Python with ctypes, so it can run without compiling anything. Note
that this is a direct translation with no attempt to make the code Pythonic.
@ihoneymon
ihoneymon / how-to-write-by-markdown.md
Last active May 10, 2024 11:18
마크다운(Markdown) 사용법

[공통] 마크다운 markdown 작성법

영어지만, 조금 더 상세하게 마크다운 사용법을 안내하고 있는
"Markdown Guide (https://www.markdownguide.org/)" 를 보시는 것을 추천합니다. ^^

아, 그리고 마크다운만으로 표현이 부족하다고 느끼신다면, HTML 태그를 활용하시는 것도 좋습니다.

1. 마크다운에 관하여