Skip to content

Instantly share code, notes, and snippets.

@joshlk
Last active September 21, 2017 09:09
Show Gist options
  • Save joshlk/c59c8615a10a23c8a37c5daf11325fc7 to your computer and use it in GitHub Desktop.
Save joshlk/c59c8615a10a23c8a37c5daf11325fc7 to your computer and use it in GitHub Desktop.
Pandas util functions
import pandas as pd
def clean_str_cols(df, encoding='ascii'):
"""
As string columns are stored as 'objects' it can cause many problems, especially when reading and writig CSVs. This function
forces the columns to be strings and be encoded as a specified encoding.
Solves `UnicodeEncodeError` errors when using `to_csv`.
"""
df = df.copy()
for col, dtype in df.dtypes.items():
if dtype.kind == 'O': # If string datatype
df[col] = df[col].astype('str') # The col can be reported as being a object when it contains mixed datatypes
df[col] = df[col].str.encode(encoding, errors='ignore').str.decode(encoding)
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment