Skip to content

Instantly share code, notes, and snippets.

@apoorvalal
Last active December 14, 2017 18:08
Show Gist options
  • Save apoorvalal/b3ecd34b388c36bc7bff95af4421ede9 to your computer and use it in GitHub Desktop.
Save apoorvalal/b3ecd34b388c36bc7bff95af4421ede9 to your computer and use it in GitHub Desktop.
Normalize spaces before importing as dataframes
#%%
import os
import re
os.chdir('data_directory')
raw_files = [x for x in os.listdir()]
for raw in raw_files:
inputfile = raw
outputfile = raw.split('.')[0]+'_clean.txt'
with open(inputfile, 'r') as f:
lines = f.readlines()
lines = [x.strip('\n') for x in lines]
data_rows = []
for line in lines:
cleanline = re.sub('\s+', ' ', line).strip()
data_rows.append(cleanline)
with open(outputfile, 'a') as f:
for obs in data_rows:
f.write(obs + '\n')
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment