Last active
September 21, 2017 09:09
-
-
Save joshlk/c59c8615a10a23c8a37c5daf11325fc7 to your computer and use it in GitHub Desktop.
Pandas util functions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
def clean_str_cols(df, encoding='ascii'): | |
""" | |
As string columns are stored as 'objects' it can cause many problems, especially when reading and writig CSVs. This function | |
forces the columns to be strings and be encoded as a specified encoding. | |
Solves `UnicodeEncodeError` errors when using `to_csv`. | |
""" | |
df = df.copy() | |
for col, dtype in df.dtypes.items(): | |
if dtype.kind == 'O': # If string datatype | |
df[col] = df[col].astype('str') # The col can be reported as being a object when it contains mixed datatypes | |
df[col] = df[col].str.encode(encoding, errors='ignore').str.decode(encoding) | |
return df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment