MinWoo(Daniel) Park dsdanielpark

## Eval-Arabic-LLMs-using-lighteval.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                alielfilali01
                / Eval-Arabic-LLMs-using-lighteval.ipynb
            
            
              Last active
              May 19, 2024 22:15
            
              
                Copy of Test-lighteval.ipynb
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## main.py
from transformers import PreTrainedTokenizerFast

fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="/home/ubuntu/LLM/module/claude-v1-tokenization.json")
text = "Hello, this is a test input."
tokens = fast_tokenizer.tokenize(text)
tokens

## LLM.md

      
              2 files
            
          
              161 forks
            
          
              13 comments
            
          
              1615 stars
            
          
                rain-1
                / LLM.md
            
            
              Last active
              July 18, 2024 22:37
            
              
                LLM Introduction: Learn Language Models
              
          
    Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Avoid being a link dump. Try to provide only valuable well tuned information.
Prelude

Neural network links before starting with transformers.

  
## RLHF.md

      
              1 file
            
          
              8 forks
            
          
              37 comments
            
          
              113 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              July 18, 2024 22:10
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## naver_review_classifications_gluon_bert.ipynb

      
              1 file
            
          
              2 forks
            
          
              1 comment
            
          
              12 stars
            
          
                haven-jeon
                / naver_review_classifications_gluon_bert.ipynb
            
            
              Last active
              February 25, 2023 08:36
            
              
                BERT with Naver Sentiment Movie Corpus 
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## deskew.py
import cv2
import numpy as np

def deskew(im, max_skew=10):
    height, width = im.shape

    # Create a grayscale image and denoise it
    im_gs = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    im_gs = cv2.fastNlMeansDenoising(im_gs, h=3)

## default nginx configuration file
# Author: Zameer Ansari
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# http://wiki.nginx.org/Pitfalls
# http://wiki.nginx.org/QuickStart
# http://wiki.nginx.org/Configuration
#
# Generally, you will want to move this file somewhere, and start with a clean
# file but keep this around for reference. Or just disable in sites-enabled.
#

## cuda_check.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Outputs some information on CUDA-enabled devices on your computer,
including current memory usage.

It's a port of https://gist.github.com/f0k/0d6431e3faa60bffc788f8b4daa029b1
from C to Python with ctypes, so it can run without compiling anything. Note
that this is a direct translation with no attempt to make the code Pythonic.

## how-to-write-by-markdown.md

      
              1 file
            
          
              1021 forks
            
          
              327 comments
            
          
              2339 stars
            
          
                ihoneymon
                / how-to-write-by-markdown.md
            
            
              Last active
              July 19, 2024 12:06
            
              
                마크다운(Markdown) 사용법
              
          
    [공통] 마크다운 markdown 작성법


영어지만, 조금 더 상세하게 마크다운 사용법을 안내하고 있는

"Markdown Guide (https://www.markdownguide.org/)" 를 보시는 것을 추천합니다. ^^


아, 그리고 마크다운만으로 표현이 부족하다고 느끼신다면, HTML 태그를 활용하시는 것도 좋습니다.

1. 마크다운에 관하여
	from transformers import PreTrainedTokenizerFast

	fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="/home/ubuntu/LLM/module/claude-v1-tokenization.json")
	text = "Hello, this is a test input."
	tokens = fast_tokenizer.tokenize(text)
	tokens
	import cv2
	import numpy as np

	def deskew(im, max_skew=10):
	height, width = im.shape

	# Create a grayscale image and denoise it
	im_gs = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
	im_gs = cv2.fastNlMeansDenoising(im_gs, h=3)
	# Author: Zameer Ansari
	# You should look at the following URL's in order to grasp a solid understanding
	# of Nginx configuration files in order to fully unleash the power of Nginx.
	# http://wiki.nginx.org/Pitfalls
	# http://wiki.nginx.org/QuickStart
	# http://wiki.nginx.org/Configuration
	#
	# Generally, you will want to move this file somewhere, and start with a clean
	# file but keep this around for reference. Or just disable in sites-enabled.
	#
	#!/usr/bin/env python
	# -- coding: utf-8 --

	"""
	Outputs some information on CUDA-enabled devices on your computer,
	including current memory usage.

	It's a port of https://gist.github.com/f0k/0d6431e3faa60bffc788f8b4daa029b1
	from C to Python with ctypes, so it can run without compiling anything. Note
	that this is a direct translation with no attempt to make the code Pythonic.