Skip to content

Instantly share code, notes, and snippets.

@BinarySpoon
Last active October 16, 2020 09:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BinarySpoon/1d79e89d2086ba01c389b2a8b9771f91 to your computer and use it in GitHub Desktop.
Save BinarySpoon/1d79e89d2086ba01c389b2a8b9771f91 to your computer and use it in GitHub Desktop.
# import medical notes -->
doc = []
with open('dates.txt') as file:
for line in file:
doc.append(line)
# create dataframe -->
df = pd.DataFrame(doc, columns=['text'])
# strip at \n -->
df['text'] = df['text'].apply(lambda x: x.strip('\n'))
# capturing all date variants -->
pattern_dates = r'\d{1,2}\/\d{1,2}\/\d{2,4}|\d{1,2}\-\d{1,2}\-\d{2,4}|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\-\d{1,2}\-\d{4}|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[,.]? \d{2}[a-z]*,? \d{4}|\d{1,2} (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z,.]* \d{4}|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]{2}[,.]* \d{4}|'+"[,.]? \d{4}|".join(month_name[1:])+"[,.]? \d{4}"+r'|\d{1,2}\/\d{4}|\d{4}'
df['date'] = df['text'].apply(lambda x:re.findall(pattern_dates,x))
# fixing outlier 271 -->
df['date'][271] = [df['date'][271][1]]
#extract dates from dataframe -->
df['date'] = df['date'].apply(lambda x: x[0])
date_list = list(df['date'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment