This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Antlia, Apus, Aquarius, Aquila, Ara, Aries, Auriga, Bootes, Caelum, Camelopardalis, | |
Cancer, Canes Venatici, Canis Major, Canis Minor, Capricornus, Carina, Cassiopeia, Centaurus, Cepheus, | |
Cetus, Chamaeleon, Circinus, Columba, Coma Berenices, Corona Australis, Corona Borealis, Corvus, Crater, | |
Crux, Cygnus, Delphinus, Dorado, Draco, Equuleus, Eridanus, Fornax, Gemini, | |
Grus, Hercules, Horologium, Hydra, Hydrus, Indus, Lacerta, Leo, Leo Minor, | |
Lepus, Libra, Lupus, Lynx, Lyra, Mensa, Microscopium, Monoceros, Musca, | |
Norma, Octans, Ophiuchus, Orion, Pavo, Pegasus, Perseus, Phoenix, Pictor, | |
Pisces, Piscis Austrinus, Puppis, Pyxis, Reticulum, Sagitta, Sagittarius, Scorpius, Sculptor, | |
Scutum, Serpens, Sextans, Taurus, Telescopium, Triangulum, Triangulum Australe, Tucana, Ursa Major, | |
Ursa Minor, Vela, Virgo, Volans, Vulpecula |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from docx import Document | |
from docx.shared import Cm, Pt | |
article_1 = """Bayern Munich came out on top in a thrilling German Cup final, beating Bayer Leverkusen 4-2 to secure its 20th title and remain on course for an historic treble. | |
David Alaba's stunning free kick and Serge Gnabry's clinical finish gave Bayern a commanding lead heading into half time and Hans-Dieter Flick's side seemingly already had one hand on the trophy. | |
However, Leverkusen responded well early in the second half and had a golden opportunity to halve the deficit through substitute Kevin Volland.""" | |
article_2 = """(CNN)Liverpool got its Premier League title-winning celebrations back on track with a 2-0 win over Aston Villa, just days after being on the receiving end of a record-equaling defeat. | |
Many had suggested Jurgen Klopp's side was suffering from something of a hangover during Thursday's 4-0 demolition at the hands of Manchester City -- the joint-heaviest defeat by a team already crowned Premier League champion -- but Liverpool re |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Godfather, The Lord of the Rings: The Return of the King, Inception, The Lord of the Rings: The Fellowship of the Ring, The Matrix, The Lord of the Rings: The Two Towers, Star Wars: Episode V - The Empire Strikes Back, Léon: The Professional, Gladiator, Terminator 2: Judgment Day, Avengers: Endgame, | |
Braveheart, Toy Story, Aliens, Batman Begins, Indiana Jones and the Last Crusade, The Avengers, Beauty and the Beast, The Sound of Music, Iron Man, Star Trek, | |
Avatar, Titanic, Lost in Translation, Home Alone, Batman, King Kong, The Shawshank Redemption, The Godfather: Part II, Pulp Fiction, Schindler's List, | |
Forrest Gump, One Flew Over the Cuckoo's Nest, Star Wars: Episode IV - A New Hope, It's a Wonderful Life, The Prestige, Back to the Future, The Lion King, Avengers: Infinity War, Psycho, Casablanca, | |
City Lights, The Dark Knight Rises, Django Unchained, WALL·E, Raiders of the Lost Ark, Rear Window, Coco, Star Wars: Episode VI - Return of the Jedi, Toy Story 3, Full Metal Jacket, | |
North by Northwest, Sin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def highlight_columns(df, rows=20, color='lightgreen', columns_to_shadow=[], columns_to_show=[]): | |
highlight = lambda slice_of_df: 'background-color: %s' % color | |
sample_df = df.head(rows) | |
if len(columns_to_show) != 0: | |
sample_df = sample_df[columns_to_show] | |
highlighted_df = sample_df.style.applymap(highlight, subset=pd.IndexSlice[:, columns_to_shadow]) | |
return highlighted_df |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_numbers_from_text(text): | |
import re | |
pattern = '[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?' | |
list_of_numbers = re.findall(pattern, text) | |
return list_of_numbers | |
# Test | |
text = """A rise in cases was re[prted acrpss a staggering 36 US states last week. In Florida, officals recorded 9,585 new cases on Saturday.""" | |
get_numbers_from_text(text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_consequent_title_words(text): | |
import re | |
pattern_compiled = re.compile(r'([A-Z][^\.!?]*[\.!?])', re.M) | |
list_of_sentences = re.findall(pattern_compiled, text) | |
list_of_sentence_tokens = [sentence.split(' ') for sentence in list_of_sentences] | |
list_of_consequent_tokens = list() | |
for tokens in list_of_sentence_tokens: | |
temp_list_of_title_tokens = list() | |
for index, t in enumerate(tokens): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def describe_text(text): | |
import re, string | |
description = dict() | |
# remove punctuation marks | |
text_wo_punctuation_marks = re.sub(f'[%s]' % re.escape(string.punctuation), '', text) | |
# tokens of the text without punctuation marks | |
tokens_of_text_wo_punctuation_marks = text_wo_punctuation_marks.split(' ') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def remove_punctuation_marks(text): | |
import string | |
import re | |
pattern = f'[%s]' % re.escape(string.punctuation) | |
text_wo_punctuation_marks = re.sub(pattern, '', text) | |
return text_wo_punctuation_marks | |
# Test | |
text = """Hello, World!""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_listed_items_with_colon(text): | |
import re | |
list_of_items = [] | |
list_of_sentences = re.split('\.|\?|\!', text) | |
for sentence in list_of_sentences: | |
if ':' in sentence: | |
start_index = sentence.find(':') | |
sub_sentence = sentence[start_index+1:] | |
list_of_items.append([word.strip() for word in sub_sentence.split(',')]) | |
return list_of_items |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_text_within_quotes(text): | |
import re | |
pattern = "\"(.*?)\"" | |
list_of_findings = re.findall(pattern, text) | |
return list_of_findings | |
# Test | |
text = """The sign said, "Walk". Then it said, "Don't Walk" then, "Walk" all within thirty seconds""" | |
get_text_within_quotes(text) |
NewerOlder