Skip to content

Instantly share code, notes, and snippets.

@kristjan-eljand
Last active April 19, 2021 06:34
Show Gist options
  • Save kristjan-eljand/b5b01304b1ed150072fe8b8a9eff1f2b to your computer and use it in GitHub Desktop.
Save kristjan-eljand/b5b01304b1ed150072fe8b8a9eff1f2b to your computer and use it in GitHub Desktop.
Summarize Estonian text using pre-trained English models
# 1. Initiate pipeline for Text summarization
summarizer = pipeline("summarization", model="t5-base")
# 2. Input sentence in Estonian
sentence_est = r"""
E-Lab on Eesti Energia IT osakonda kuuluv uurimis- ja arendusüksus.
Üksuse eesmärk on kiirendada innovatsiooni ja aidata kaasa uute ideede
esimeste arendusetappide (kontseptsiooni tõestus ja prototüüpimine) läbimisele.
Tiimis on täna 12 liiget, kelle seas seitse tarkvarainseneri, kaks andmeteadurit,
tarkvaraarhitekt, tooteomanik ning tehnoloogiaskaut.
Lisaks tehakse koostööd tudengitiimidega nii Taltech’st kui ka Tartu Ülikoolist.
"""
# 3. Translate the input from Est to Eng
sentence_eng = translate(sentence_est, EST_TO_ENG)[0]['translation_text']
print("Sentence that is translated to Eng:\n", sentence_eng)
# 4. Summarize and limit the output to maximum of 30 tokens
result_eng = summarizer(sentence_eng, max_length=30)[0]['summary_text']
print(f"Summary in English: {result_eng}")
# 5. Translate the summary back to Estonian
result_est = translate(result_eng, ENG_TO_EST)[0]['translation_text']
print(f"Summary in Estonian: {result_est}")
# Output:
# Sentence that is translated to Eng:
# E-Lab is a research and development unit belonging to the
# IT department of Eesti Energia. The aim of the unit is to
# accelerate innovation and contribute to the first stages of
# development of new ideas (concept verification and prototyping).
# The team has 12 members today, including seven software engineers,
# two data researchers, software architect, product owner and technology scout.
#
# Summary in English:
# The aim of the unit is to accelerate innovation and contribute to the first
# stages of development of new ideas . the team has 12 members today,
#
# Summary in Estonian:
# üksuse eesmärk on kiirendada innovatsiooni ja aidata kaasa uute ideede
# arendamise esimestele etappidele . meeskonnal on täna 12 liiget;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment