Skip to content

Instantly share code, notes, and snippets.

@timothycarambat
Created September 19, 2018 04:09
Show Gist options
  • Save timothycarambat/1fd6d7c6b82d11c952d8d7082cbca29f to your computer and use it in GitHub Desktop.
Save timothycarambat/1fd6d7c6b82d11c952d8d7082cbca29f to your computer and use it in GitHub Desktop.
u/BeginningAlternative Data Parser in Python
bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2)bla bla bla bla and more bla bla - last name, first name day time (activity 1; activity 2; activity 3)
#Note: This file should run in the same directory as a data.txt file which contains the unformatted data.
#It will export a .csv in the same directory. Tested in Python 2.7.
#If you do not want to export a field then delete it from the header and the data_item dictionary.
import re
import csv
headers = ['Data','Last Name', 'First Name', 'Day', 'Time', 'A1', 'A2', 'A3']
data_collection = []
raw_data = open('data.txt', 'r').read()
first_pass = raw_data.split(')')
for data_item_whole in first_pass:
if data_item_whole == '':
break
data_chunk_pass = data_item_whole.split(' - ',1)
data_chunk = data_chunk_pass[0]
last_name_pass = data_chunk_pass[1].split(', ')
last_name = last_name_pass[0]
first_name_pass = last_name_pass[1]
first_name = re.split('day|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday',last_name_pass[1])[0].strip()
day_pass = last_name_pass[1].replace(first_name+' ','') #remove name we found
day = day_pass.split(' ',1)[0]
time_pass = day_pass.replace(day+' ', '')
time = time_pass.split(' ',1)[0]
activities = time_pass.replace(time+' ', '').replace('(','').split(';')
data_item = {
'Data': data_chunk,
'Last Name': last_name,
'First Name': first_name,
'Day': day,
'Time': time,
'A1': activities[0] if 0 < len(activities) else '',
'A2': activities[1] if 1 < len(activities) else '',
'A3': activities[2] if 2 < len(activities) else '',
}
data_collection.append(data_item)
with open('export.csv', 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, headers)
dict_writer.writeheader()
dict_writer.writerows(data_collection)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment