Skip to content

Instantly share code, notes, and snippets.

@tchiavegatti
Created February 11, 2020 21:08
Show Gist options
  • Save tchiavegatti/414df3b367e56ba274261f844f920a43 to your computer and use it in GitHub Desktop.
Save tchiavegatti/414df3b367e56ba274261f844f920a43 to your computer and use it in GitHub Desktop.
[Fix column names] Remove whitespace so you can use the df.ColumnName notation #pandas
# From https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd
#Sometimes you load in that DataFrame from a csv or excel file that some unlucky excel user created and you just wish everyone used Python. Why do they have to make the column names uppercase, with spaces, and whitespace all around? Do they like doing this to you? They probably hate you, that’s it. They did this:
#The nerve!
#Now you can’t reference the columns with the convenient .{column name here} notation. You’ll have to do the [''] thing. It’s not the most horrible thing that ever happened, you’ve been through a #lot. You can take it. But you’re used to the other way. Plus, what’s with those parentheses? You can’t have that.
#Luckily, pandas has a convenient .str method that you can use on text data. Since the column names are an ‘index’ type, you can use .str on them too. You can fix all these lapses of judgement by #chaining together a bunch of these .str functions. Like so:
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment