Skip to content

Instantly share code, notes, and snippets.

View bofenghuang's full-sized avatar
🎯
Focusing

bofeng huang bofenghuang

🎯
Focusing
View GitHub Profile
@bofenghuang
bofenghuang / vigogne_chat_v3.preset.json
Created November 4, 2023 15:20
Preset file for LM Studio
{
"name": "Vigogne Chat V3",
"inference_params": {
"input_prefix": "[INST]",
"input_suffix": "[/INST]",
"antiprompt": [
"[INST]"
],
"pre_prompt": "[INST]<<SYS>>\\nVous êtes Vigogne, un assistant IA créé par Zaion Lab. Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.\\n<</SYS>>\\n\\n[/INST]"
}
@bofenghuang
bofenghuang / convert_whisper_to_openai.py
Created March 27, 2023 19:17
Convert HF's whisper checkpoint to OpenAI
#!/usr/bin/env python
# Copyright 2022 Bofeng Huang
# coding=utf-8
"""
Usage:
./scripts/convert_whisper_to_openai.py \
--hf_model_name_or_path outputs/general/whisper-large-v2-ft-french-lr4e6-bs256-augment \
--whisper_state_path outputs/general/whisper-large-v2-ft-french-lr4e6-bs256-augment/checkpoint_openai.pt
"""
#!/usr/bin/env python
import logging
import time
from typing import List, Optional
import numpy as np
import scipy.stats
import speechbrain
import torch
#!/usr/bin/env python
"""
Based on https://colab.research.google.com/drive/1mypqbHDrusZaIbqPoiEGY-WIbnpMHa2I?usp=sharing#scrollTo=bY0AwDiSxdVE
- Random Sampling
- After sorting (minimum padding but no randomness)
- Org Dynamic Batching (the one in speechbrain current version)
- **Mdf Dynamic Batching w/ fitted lognorm** (bucket boundaries set up with lognormal distribution fitted on dataset)
- **Mdf Dynamic Batching w/ fitted beta** (bucket boundaries set up with beta distribution fitted on dataset, mentioned in tuto)