Created
April 20, 2016 21:33
-
-
Save tdhopper/91f03250892c12c6e0d35ca6d2ade1ca to your computer and use it in GitHub Desktop.
For anyone who wants a shorter version of the above (without using shelve, which give me this complain below:
File "---.py", line 104, in get_possible_values
with shelve.open(shelf_name, writeback=True) as shelf:
AttributeError: DbfilenameShelf instance has no attribute '__exit__'
def concat(dataframes, categorical_columns, ignore_index=False):
"""Concatenate dataframes with unordered categorical columns.
Will mutate categorical columns of origial dataframes.
dataframes: list of dataframes.
categorical_columns: list of names of unordered, categorical columns.
ignore_index: same as from pd.concat.
shelf_name: filename for shelve object to store possible values.
"""
# Get all possible values for all categorical_columns
possible_values = {}
for col in categorical_columns:
possible_values[col] = set()
for df in dataframes:
for col in categorical_columns:
for val in df[col]:
possible_values[col].add(val)
# Use pd.Categorical() to re-categorizing the values in all columns
for df in dataframes:
for col in categorical_columns:
df[col] = pd.Categorical(
df[col], categories=possible_values[col], ordered=False)
return pd.concat(dataframes, axis=0, ignore_index=ignore_index)\
PS: you won't need to do all this if you are running pandas 0.19 or later. In my case I gotta live with 0.18 and this saved my life today! Thank you @tdhopper !
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
unfortunately this doesn't work any more in my setup: