Skip to content

Instantly share code, notes, and snippets.

View oplatek's full-sized avatar

Ondřej Plátek oplatek

View GitHub Profile
@thomwolf
thomwolf / fast_speech_text_speech.py
Last active April 15, 2024 22:31
speech to text to speech
""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
git clone https://github.com/myshell-ai/OpenVoice
cd OpenVoice
git clone https://huggingface.co/myshell-ai/OpenVoice
cp -r OpenVoice/* .
pip install whisper pynput pyaudio
"""
from openai import OpenAI
import time
@hollance
hollance / alignment-heads.md
Last active December 5, 2023 13:25
Alignment heads for Whisper word-level timestamps with Hugging Face Transformers

To allow the Hugging Face version of Whisper to predict word-level timestamps, a new property alignment_heads must be added to the GenerationConfig object. This is a list of [layer, head] pairs that select the cross-attention heads that are highly correlated to word-level timing.

If your Whisper checkpoint does not have the alignment_heads property yet, it can be added in two possible ways.

Method 1. Change the model.generation_config property:

# load the model
model = WhisperForConditionalGeneration.from_pretrained("your_checkpoint")

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@juanmc2005
juanmc2005 / model.py
Last active October 3, 2022 15:27
PLDA scoring using Pyannote (https://github.com/pyannote/pyannote-audio) and a customized version of PLDA (https://github.com/RaviSoji/plda) to include some specific features like length normalization and latent space dimension tuning
# Copyright 2017 Ravi Sojitra. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@zcaceres
zcaceres / wav2letter-asg.md
Last active April 28, 2022 23:57
Rough Draft Faster, Better Speech Recognition with Wav2Letter's Auto Segmentation Criterion

Faster, Better Speech Recognition with Wav2Letter's Auto Segmentation Criterion

In 2016, Facebook AI Research (FAIR) broke new ground with Wav2Letter, a fully convolutional speech recognition system.

In Wav2Letter, FAIR showed that systems based on convolutional neural networks (CNNs) could person as well as traditional recurrent neural network-based approaches.

In this article, we'll focus on an understudied module at the core of Wav2Letter: the Auto Segmentation (ASG) Criterion.

Architecture of the wav2letter model

@XinDongol
XinDongol / profile_pyt.md
Last active March 28, 2022 11:26
How to profile your pytorch codes

Inside profiler

import torch
import torchvision.models as models

model = models.densenet121(pretrained=True)
x = torch.randn((1, 3, 224, 224), requires_grad=True)

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    model(x)
@lattner
lattner / TaskConcurrencyManifesto.md
Last active April 25, 2024 18:22
Swift Concurrency Manifesto
@ianthetechie
ianthetechie / Twilio Asterisk Secure Trunking HOWTO.md
Last active November 1, 2023 16:01
A short guide on how to set up an encrypted VoIP system using Twilio and Asterisk.

Twilio Asterisk Secure Trunking HOWTO

This is a short guide on how to set up an encrypted VoIP system using Twilio and Asterisk. I was a little annoyed that just about everything these days still uses unencrypted RTP for media (though just about everyone supports SIP over TLS). So I spent a weekend looking at options, and settled on a totally overkill solution involving Twilio's secure trunking to an Asterisk PBX. While all bets are off once it hits the PSTN, at least you won't be blasting your conversations over the internet in clear text.

@pahud
pahud / check_spot_price_now.md
Created August 10, 2016 08:10
check current spot price with aws-cli
$ aws --region=ap-northeast-2 ec2 describe-spot-price-history --instance-types c4.large --start-time=$(date +%s) --product-descriptions="Linux/UNIX" --query 'SpotPriceHistory[*].{az:AvailabilityZone, price:SpotPrice}'
[
    {
        "price": "0.024900", 
        "az": "ap-northeast-2a"
    }, 
 {
@htp
htp / curl-websocket.sh
Last active April 25, 2024 14:57
Test a WebSocket using curl.
curl --include \
--no-buffer \
--header "Connection: Upgrade" \
--header "Upgrade: websocket" \
--header "Host: example.com:80" \
--header "Origin: http://example.com:80" \
--header "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
--header "Sec-WebSocket-Version: 13" \
http://example.com:80/