Skip to content

Instantly share code, notes, and snippets.

@idan
Created October 8, 2014 13:13
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save idan/c27b2dd45a78273af848 to your computer and use it in GitHub Desktop.
Save idan/c27b2dd45a78273af848 to your computer and use it in GitHub Desktop.
Exploding Postgres HSTORE columns in Pandas
def explode_hstore(df, column):
"""Explode a column of a dataframe containing PostgreSQL hstore k-v pairs in the format of:
"foo"=>"bar", "baz"=>"quux", ...
Every key becomes a column. If a given row's pairs doesn't have a key, then the resulting column's value
will contain NaN.
"""
# split the tags column out as a new series, and break up each k=>v pair
s = df[column].str.split(', ').apply(pd.Series, 1).stack()
# extract the keys and values into columns
extracted = s.str.extract(r'"(?P<key>[\w-]+)"=>"(?P<val>[\w-]+)"', re.IGNORECASE)
# toss the unnecessary multi index that is generated in the process
extracted.index = extracted.index.droplevel(1)
# pivot the table make the rows in keys become columns
pivoted = extracted.pivot_table(values='val', index=extracted.index, columns='key', aggfunc='first')
# join with the original table and return
return df.join(pivoted)
@cmcaine
Copy link

cmcaine commented Oct 22, 2017

Your version will break if the keys or values contain ', '. Here's a simpler, more robust version:

def expand_hstore(column):
    "Given a postgres hstore column, expand into a table of simple values"
    def tojsonarr(values):
         "Create a JSON array from values"
         acc = "["
         for v in values:
             acc += v + ","
         return acc[:-1] + "]"

    return pd.read_json(tojsonarr(column.values))


def replace_hstore(dataframe, index):
    """Replace the hstore column at index with its expanded columns

    If hstore shares keys with dataframe, output will have multiple columns
    with the same index, which is weird, but pandas is OK with that.

    """
    expanded = expand_hstore(dataframe[index])
    del dataframe[index]
    return pd.concat([dataframe, expanded], axis=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment