Skip to content

Instantly share code, notes, and snippets.

View FesianXu's full-sized avatar
🎯
Focusing

FesianXu FesianXu

🎯
Focusing
View GitHub Profile
@FesianXu
FesianXu / scan_trash.py
Created May 22, 2025 07:27
clean the files with invalid filename
import os
import shutil
import argparse
def is_valid_filename(entry_bytes):
try:
decoded = entry_bytes.decode('utf-8')
except UnicodeDecodeError:
return False
# 检查控制字符(ASCII 0-31,127)
@FesianXu
FesianXu / wrapper.sh
Created December 29, 2024 04:01
The easy parallel in shell
INPUT_FILE=${1}
OUTPUT_PATH=${2}
NUM_PROC=${3}
tmp_files=".tmp"
rm -f ${tmp_files}/part-*
mkdir -p ${tmp_files}
mkdir -p ${OUTPUT_PATH}
total_lines=`wc -l ${INPUT_FILE} | cut -d " " -f1`
@FesianXu
FesianXu / arrangement_1226_conf.json
Created December 29, 2024 03:50
To orchestrate the GPU tasks
{
"date": "20241226",
"task_config": [
"dummy_task_1",
"dummy_task_2",
"dummy_task_3"
],
"wake_up_time": 600
}
@FesianXu
FesianXu / README_hfd.md
Created September 26, 2024 04:00 — forked from padeoe/README_hfd.md
CLI-Tool for download Huggingface models and datasets with aria2/wget+git

🤗Huggingface Model Downloader

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, this command-line tool smartly utilizes wget or aria2 for LFS files and git clone for the rest.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
  • 🚀 Multi-threaded Download: Utilize multiple threads to speed up the download process.
  • 🚫 File Exclusion: Use --exclude or --include to skip or specify files, save time for models with duplicate formats (e.g., *.bin or *.safetensors).
  • 🔐 Auth Support: For gated models that require Huggingface login, use --hf_username and --hf_token to authenticate.
  • 🪞 Mirror Site Support: Set up with HF_ENDPOINT environment variable.