Panagiotis Antoniadis PanosAntoniadis

## gist:9a161820b57f87384d3e507ffb28ca41
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
# tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
# model = AutoModelForCausalLM.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo

n = 100 # define the number of synthetic samples to generate
new_texts = []
new_labels = []

## gist:fc5c44842f347aa6a1ebe28a12c0122b
n = 100 # define the number of synthetic samples to generate
new_texts = []
new_labels = []
api_key =  # insert your api key for GPT-3
headers = {'Authorization' : 'Bearer ' + api_key,
              'Content-type':'application/json',
              'Accept':'application/json'}

iter = 0
while iter < n:

## gsoc2019-sphinx_Final_Report.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                PanosAntoniadis
                / gsoc2019-sphinx_Final_Report.md
            
            
              Last active
              August 26, 2019 15:30
            
              
                Final Report for GSoC 2019 for Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training
              
          
        
      
        
  
      
    Final Report for Google Summer of Code 2019

This is a final report of the work which was done as part of Creation of an online Greek mail dictation system, using Sphinx and
personalized acoustic/language model training hosted in https://github.com/eellak/gsoc2019-sphinx and https://snf-870149.vm.okeanos.grnet.gr.
Abstract

The aim of the project is the implementation of a personalized Greek mail dictation system. The personalization is done both in the language model using the user's emails and in the acoustic model using previous recordings of the user. Also, the ASR output is passed through a post-processing system, where possible errors are corrected based on the adapted language model. By this way, we increase the accuracy of the default Greek model, which is low as a result of the limited amount of open source speech datasets.
A more detailed explanation of the project is located at the README and [the Wiki Home Page](https://git
	tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
	# tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo

	model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B") # for GPT-J
	# model = AutoModelForCausalLM.from_pretrained('EleutherAI/gpt-neo-2.7B') # for GPT-Neo

	n = 100 # define the number of synthetic samples to generate
	new_texts = []
	new_labels = []