Skip to content

Instantly share code, notes, and snippets.

View leavyli's full-sized avatar
💭
I may be slow to respond.

leavyli

💭
I may be slow to respond.
View GitHub Profile
@leavyli
leavyli / longest_chinese_tokens_gpt4o.py
Created May 15, 2024 15:00 — forked from ctlllll/longest_chinese_tokens_gpt4o.py
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except:
@leavyli
leavyli / README_hfd.md
Created December 29, 2023 01:07 — forked from padeoe/README_hfd.md
CLI Tool for Downloading Huggingface Models and Datasets

🤗Huggingface Model Downloader

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, this command-line tool smartly utilizes wget or aria2 for LFS files and git clone for the rest.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
  • 🚀 Multi-threaded Download: Utilize multiple threads to speed up the download process.
  • 🚫 File Exclusion: Use --exclude or --include to skip or specify files, save time for models with duplicate formats (e.g., .bin and .safetensors).
  • 🔐 Auth Support: For gated models that require Huggingface login, use --hf_username and --hf_token to authenticate.
  • 🪞 Mirror Site Support: Set up with HF_ENDPOINT environment variable.