Skip to content

Instantly share code, notes, and snippets.

View PanosAntoniadis's full-sized avatar
🏠
Working from home

Panagiotis Antoniadis PanosAntoniadis

🏠
Working from home
View GitHub Profile
@PanosAntoniadis
PanosAntoniadis / gist:9a161820b57f87384d3e507ffb28ca41
Last active December 14, 2021 10:12
Generate synthetic data with GPT-J/GPT-Neo
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
# tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
# model = AutoModelForCausalLM.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo
n = 100 # define the number of synthetic samples to generate
new_texts = []
new_labels = []
@PanosAntoniadis
PanosAntoniadis / gist:fc5c44842f347aa6a1ebe28a12c0122b
Last active December 13, 2021 13:09
Generate synthetic data with GPT-3
n = 100 # define the number of synthetic samples to generate
new_texts = []
new_labels = []
api_key = # insert your api key for GPT-3
headers = {'Authorization' : 'Bearer ' + api_key,
'Content-type':'application/json',
'Accept':'application/json'}
iter = 0
while iter < n:
@PanosAntoniadis
PanosAntoniadis / gsoc2019-sphinx_Final_Report.md
Last active August 26, 2019 15:30
Final Report for GSoC 2019 for Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training

Final Report for Google Summer of Code 2019

This is a final report of the work which was done as part of Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training hosted in https://github.com/eellak/gsoc2019-sphinx and https://snf-870149.vm.okeanos.grnet.gr.

Abstract

The aim of the project is the implementation of a personalized Greek mail dictation system. The personalization is done both in the language model using the user's emails and in the acoustic model using previous recordings of the user. Also, the ASR output is passed through a post-processing system, where possible errors are corrected based on the adapted language model. By this way, we increase the accuracy of the default Greek model, which is low as a result of the limited amount of open source speech datasets.

A more detailed explanation of the project is located at the README and [the Wiki Home Page](https://git