Skip to content

Instantly share code, notes, and snippets.

@aniruddha27
Created June 28, 2020 16:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aniruddha27/b656a4b9201f1b4792a510451878cf50 to your computer and use it in GitHub Desktop.
Save aniruddha27/b656a4b9201f1b4792a510451878cf50 to your computer and use it in GitHub Desktop.
# Folder path
folders = glob.glob('./UNGD/UNGDC 1970-2018/Converted sessions/Session*')
# Dataframe
df = pd.DataFrame(columns={'Country','Speech','Session','Year'})
# Read speeches by India
i = 0
for file in folders:
speech = glob.glob(file+'/IND*.txt')
with open(speech[0],encoding='utf8') as f:
# Speech
df.loc[i,'Speech'] = f.read()
# Year
df.loc[i,'Year'] = speech[0].split('_')[-1].split('.')[0]
# Session
df.loc[i,'Session'] = speech[0].split('_')[-2]
# Country
df.loc[i,'Country'] = speech[0].split('_')[0].split("\\")[-1]
# Increment counter
i += 1
df.head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment