This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: Modi Speech Scrapper | |
on: | |
schedule: | |
# Runs every day at 8 AM | |
- cron: '0 8 * * *' | |
# Run workflow manually (without waiting for the cron to be called), through the Github Actions Workflow page directly | |
workflow_dispatch: | |
jobs: | |
modi-speech-scrapper: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"title": "Narendra Modi - Text Speeches", | |
"id": "adiamaan/modi-speeches", | |
"subtitle": "Speeches of the Prime Minister of India", | |
"description": "### Context\n**Narendra Damodaradas Modi** is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is a Member of Parliament from Varanasi.\n\n![Modi](https://images.hindustantimes.com/img/2021/09/04/550x309/PTI09-03-2021-000085B-0_1630691680953_1630739078395.jpg)\n\nModi had a long political career, before quickly rising within his party from the Chief Minister of Gujarat (2001 - 2014) to the Primi Minister candidate in the 2014 election. Known for his excellent oratorical skills and ability to connect to the common man, **this dataset gives access to all his text speeches starting from 2018**.\n\n**This dataset will be updated every day**, adding in new speeches when it is available.\n\n### Content\nThe contents are scrapped from [Narendra Modi](https://www.narend |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
updated_speeches = ( | |
pd.concat([delta_speeches, speeches]) | |
) | |
updated_speeches.to_csv("./data/modi_speeches.csv", index=False) | |
api.dataset_create_version( | |
"./data/", | |
version_notes=f"Updated on {datetime.datetime.now().strftime('%Y-%m-%d')}", | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from kaggle.api.kaggle_api_extended import KaggleApi | |
def get_speeches(api: KaggleApi) -> str: | |
"""Get latest speech from kaggle dataset | |
Args: | |
api (KaggleApi): Kaggle api | |
Returns: | |
Latest speech title from previous run | |
""" | |
api.dataset_download_files("adiamaan/modi-speeches", path="./data", unzip=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
i = 1 | |
while True: | |
# This GET API request returns the ith page. The pages are sorted in descending order | |
# based on the datetime it is published | |
r = requests.get( | |
f"https://www.narendramodi.in/speech/loadspeeche?page={i}&language=en", | |
headers=headers, | |
) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def generate_buddhabrot(non_mandelbrot: np.ndarray, size: int) -> np.ndarray: | |
"""Generate buddhabrot image array | |
Args: | |
non_manderlbrot (np.ndarray): Array of non-mandelbrot points on the complex plane | |
size (int): Size of the image array |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def make_non_mandelbrot_set(nsamples: int, max_iterations: int) -> np.ndarray: | |
"""Generate a set of complex numbers that are not in the Mandelbrot set. | |
This employs some of the optimizations from this page, | |
http://en.wikipedia.org/wiki/Mandelbrot_set#Optimizations | |
In order to minimize run time, we are trying to reduce the number of points |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
x_dim, y_dim = 500, 500 | |
x = np.linspace(-2, 1, x_dim) | |
y = np.linspace(-1.5, 1.5, y_dim) | |
max_iterations = 100 | |
array = np.zeroes((x_dim, y_dim)) | |
for i in range(x): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"editor.fontLigatures": true, | |
"editor.fontFamily": "'Victor Mono','Fira Code', Consolas, 'Courier New', monospace", | |
"editor.fontSize": 15, | |
"editor.tokenColorCustomizations": { | |
"textMateRules": [ | |
{ | |
"scope": [ | |
"invalid", | |
"keyword.operator", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def setcols(df, fn=lambda x: x.columns.map('_'.join), cols=None): | |
"""Sets the column of the data frame to the passed column list. | |
""" | |
if cols: | |
df.columns = cols | |
else: | |
df.columns = fn(df) | |
return df |
NewerOlder