Last active
April 19, 2021 06:34
-
-
Save kristjan-eljand/b5b01304b1ed150072fe8b8a9eff1f2b to your computer and use it in GitHub Desktop.
Summarize Estonian text using pre-trained English models
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 1. Initiate pipeline for Text summarization | |
summarizer = pipeline("summarization", model="t5-base") | |
# 2. Input sentence in Estonian | |
sentence_est = r""" | |
E-Lab on Eesti Energia IT osakonda kuuluv uurimis- ja arendusüksus. | |
Üksuse eesmärk on kiirendada innovatsiooni ja aidata kaasa uute ideede | |
esimeste arendusetappide (kontseptsiooni tõestus ja prototüüpimine) läbimisele. | |
Tiimis on täna 12 liiget, kelle seas seitse tarkvarainseneri, kaks andmeteadurit, | |
tarkvaraarhitekt, tooteomanik ning tehnoloogiaskaut. | |
Lisaks tehakse koostööd tudengitiimidega nii Taltech’st kui ka Tartu Ülikoolist. | |
""" | |
# 3. Translate the input from Est to Eng | |
sentence_eng = translate(sentence_est, EST_TO_ENG)[0]['translation_text'] | |
print("Sentence that is translated to Eng:\n", sentence_eng) | |
# 4. Summarize and limit the output to maximum of 30 tokens | |
result_eng = summarizer(sentence_eng, max_length=30)[0]['summary_text'] | |
print(f"Summary in English: {result_eng}") | |
# 5. Translate the summary back to Estonian | |
result_est = translate(result_eng, ENG_TO_EST)[0]['translation_text'] | |
print(f"Summary in Estonian: {result_est}") | |
# Output: | |
# Sentence that is translated to Eng: | |
# E-Lab is a research and development unit belonging to the | |
# IT department of Eesti Energia. The aim of the unit is to | |
# accelerate innovation and contribute to the first stages of | |
# development of new ideas (concept verification and prototyping). | |
# The team has 12 members today, including seven software engineers, | |
# two data researchers, software architect, product owner and technology scout. | |
# | |
# Summary in English: | |
# The aim of the unit is to accelerate innovation and contribute to the first | |
# stages of development of new ideas . the team has 12 members today, | |
# | |
# Summary in Estonian: | |
# üksuse eesmärk on kiirendada innovatsiooni ja aidata kaasa uute ideede | |
# arendamise esimestele etappidele . meeskonnal on täna 12 liiget; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment