Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tdhopper/91f03250892c12c6e0d35ca6d2ade1ca to your computer and use it in GitHub Desktop.
Save tdhopper/91f03250892c12c6e0d35ca6d2ade1ca to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tdhopper
Copy link
Author

There is a ticket about this here: pandas-dev/pandas#12699

@nicoa
Copy link

nicoa commented Sep 4, 2017

unfortunately this doesn't work any more in my setup:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.6
pip: 9.0.1
setuptools: 34.2.0
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.19.0
xarray: 0.9.1
IPython: 5.2.2
sphinx: 1.5.2
patsy: 0.4.1
dateutil: 2.5.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.2
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.6.4
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: 1.1.4
pymysql: 0.7.10.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: 0.2.0
pandas_datareader: None

@lenguyenthedat
Copy link

lenguyenthedat commented Jan 9, 2018

For anyone who wants a shorter version of the above (without using shelve, which give me this complain below:

  File "---.py", line 104, in get_possible_values
    with shelve.open(shelf_name, writeback=True) as shelf:
AttributeError: DbfilenameShelf instance has no attribute '__exit__'
    def concat(dataframes, categorical_columns, ignore_index=False):
        """Concatenate dataframes with unordered categorical columns.

        Will mutate categorical columns of origial dataframes.

        dataframes: list of dataframes.
        categorical_columns: list of names of unordered, categorical columns.
        ignore_index: same as from pd.concat.
        shelf_name: filename for shelve object to store possible values.
        """

        # Get all possible values for all categorical_columns
        possible_values = {}
        for col in categorical_columns:
            possible_values[col] = set()
        for df in dataframes:
            for col in categorical_columns:
                for val in df[col]:
                    possible_values[col].add(val)

        # Use pd.Categorical() to re-categorizing the values in all columns
        for df in dataframes:
            for col in categorical_columns:
                df[col] = pd.Categorical(
                    df[col], categories=possible_values[col], ordered=False)

        return pd.concat(dataframes, axis=0, ignore_index=ignore_index)\

PS: you won't need to do all this if you are running pandas 0.19 or later. In my case I gotta live with 0.18 and this saved my life today! Thank you @tdhopper !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment