Skip to content

Instantly share code, notes, and snippets.

@ThibaudLamothe
Created May 9, 2020 13:23
Show Gist options
  • Save ThibaudLamothe/f1c6b5d3d25ffdcba95ee26549b04c7e to your computer and use it in GitHub Desktop.
Save ThibaudLamothe/f1c6b5d3d25ffdcba95ee26549b04c7e to your computer and use it in GitHub Desktop.
Creating a corpus from different sentences to minimize the number of Translations using DeepL
# Importing our previous translation function
from run_translation import translate_sentence
# Sentences input
sentence1 = 'I want to translate a first sentence without any link to the second one.'
sentence2 = 'The starlings ate all the cherries in one afternoon, there won\'t be any more for us.''
# Creation of the corpus as a list of strings
corpus = [sentence1, sentence2]
# Defining a joiner to separate the sentences
joiner = '\n____\n'
# Create a single "extended" string to load DeepL once
full_text = corpus.join(joiner)
# Running the translation process
translated_text = translate_sentence(full_text)
# Splitting the translation
corpus_translated = translated_text.split(joiner)
# Checking that we have the correct number of sentences after translation and split
assert len(corpus)==len(corpus_translated)
print("Translation of the corpus completed.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment