Skip to content

Instantly share code, notes, and snippets.

View BrightXiaoHan's full-sized avatar
🎯
Focusing

Bing Han BrightXiaoHan

🎯
Focusing
  • Ifun Game
  • China.
  • 02:24 (UTC -12:00)
View GitHub Profile
@padeoe
padeoe / README_hfd.md
Last active July 4, 2024 07:58
CLI-Tool for download Huggingface models and datasets with aria2/wget+git

🤗Huggingface Model Downloader

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, this command-line tool smartly utilizes wget or aria2 for LFS files and git clone for the rest.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
  • 🚀 Multi-threaded Download: Utilize multiple threads to speed up the download process.
  • 🚫 File Exclusion: Use --exclude or --include to skip or specify files, save time for models with duplicate formats (e.g., *.bin or *.safetensors).
  • 🔐 Auth Support: For gated models that require Huggingface login, use --hf_username and --hf_token to authenticate.
  • 🪞 Mirror Site Support: Set up with HF_ENDPOINT environment variable.
@geekodour
geekodour / comparision_whispercpp_faster_whisper.org
Last active August 23, 2023 13:16
whisper.cpp vs faster-whisper using ctranslate2

This is comparison between whisper.cpp and faster-whisper. The faster-whisper readme has some benchmarks on the readme but wanted to test it myself. For whisper, I just ran manually. For faster-whisper, wrote this small script

whisper.cpp

  • ./main -bs 5 -p 2 -f steve2.wav -m models/ggml-small.en.bin
    • Total 8 CPU threads on my 12 core machine
    • -bs 2 : actually performs better about 10s faster.
@mattiasarro
mattiasarro / rwkv.py
Last active May 27, 2024 09:17
RWKV MVP
# Taken from https://johanwind.github.io/2023/03/23/rwkv_details.html.
# I've added additional comments restructured it a tiny bit, which makes it clearer for me.
import numpy as np
from torch import load as torch_load # Only for loading the model weights
from tokenizers import Tokenizer
exp = np.exp
layer_norm = lambda x, w, b : (x - np.mean(x)) / np.std(x) * w + b
sigmoid = lambda x : 1/(1 + exp(-x))
@ymoslem
ymoslem / M2M-100-example.py
Last active May 17, 2024 09:49
Example of translating a file with M2M-100 using CTranslate2
# This example uses M2M-100 models converted to the CTranslate2 format.
# Download CTranslate2 models:
# • M2M-100 418M-parameter model: https://bit.ly/33fM1AO
# • M2M-100 1.2B-parameter model: https://bit.ly/3GYiaed
import ctranslate2
import sentencepiece as spm
@tommyip
tommyip / venv.fish
Last active June 2, 2024 20:22
venv.fish - Automatically activate/deactivate virtualenv in fish shell
# Based on https://gist.github.com/bastibe/c0950e463ffdfdfada7adf149ae77c6f
# Changes:
# * Instead of overriding cd, we detect directory change. This allows the script to work
# for other means of cd, such as z.
# * Update syntax to work with new versions of fish.
# * Handle virtualenvs that are not located in the root of a git directory.
function __auto_source_venv --on-variable PWD --description "Activate/Deactivate virtualenv on directory change"
status --is-command-substitution; and return
@SteveHere
SteveHere / proxy_socks2http.py
Last active October 30, 2023 03:25 — forked from zengxs/proxy_socks2http.py
Prettification + update to current asyncio syntax (Python 3.11+)
import logging
import socks # use pysocks
import asyncio
from datetime import datetime
from itertools import cycle
logging.basicConfig(level=logging.INFO)
socks_router_loop = cycle(( # simple round-robin router to socks proxies
('127.2.0.0', 9050, None, None), # address, port, username, password
# ('127.3.0.0', 9050, "proxy", "passwordpassword"),
@cristianadam
cristianadam / bundle_static_library.cmake
Created January 17, 2020 00:30
CMake function which bundles multiple static libraries into one
# MIT License
#
# Copyright (c) 2019 Cristian Adam
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
@zengxs
zengxs / proxy_socks2http.py
Created November 5, 2019 13:53
Convert socks proxy to http proxy (use python with asyncio)
import asyncio
import logging
import re
from asyncio import StreamReader, StreamWriter, StreamReaderProtocol
from collections import namedtuple
from typing import Optional
import socks # use pysocks
logging.basicConfig(level=logging.INFO)
@dteoh
dteoh / .gitconfig
Created March 17, 2019 23:40
Use Neovim as git mergetool
[merge]
tool = vimdiff
[mergetool]
keepBackup = false
[mergetool "vimdiff"]
cmd = nvim -d $LOCAL $REMOTE $MERGED -c '$wincmd w' -c 'wincmd J'
@W4ngatang
W4ngatang / download_glue_data.py
Last active May 23, 2024 12:55
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC