Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Exploding Postgres HSTORE columns in Pandas
def explode_hstore(df, column):
"""Explode a column of a dataframe containing PostgreSQL hstore k-v pairs in the format of:
"foo"=>"bar", "baz"=>"quux", ...
Every key becomes a column. If a given row's pairs doesn't have a key, then the resulting column's value
will contain NaN.
# split the tags column out as a new series, and break up each k=>v pair
s = df[column].str.split(', ').apply(pd.Series, 1).stack()
# extract the keys and values into columns
extracted = s.str.extract(r'"(?P<key>[\w-]+)"=>"(?P<val>[\w-]+)"', re.IGNORECASE)
# toss the unnecessary multi index that is generated in the process
extracted.index = extracted.index.droplevel(1)
# pivot the table make the rows in keys become columns
pivoted = extracted.pivot_table(values='val', index=extracted.index, columns='key', aggfunc='first')
# join with the original table and return
return df.join(pivoted)

This comment has been minimized.

Copy link

commented Oct 22, 2017

Your version will break if the keys or values contain ', '. Here's a simpler, more robust version:

def expand_hstore(column):
    "Given a postgres hstore column, expand into a table of simple values"
    def tojsonarr(values):
         "Create a JSON array from values"
         acc = "["
         for v in values:
             acc += v + ","
         return acc[:-1] + "]"

    return pd.read_json(tojsonarr(column.values))

def replace_hstore(dataframe, index):
    """Replace the hstore column at index with its expanded columns

    If hstore shares keys with dataframe, output will have multiple columns
    with the same index, which is weird, but pandas is OK with that.

    expanded = expand_hstore(dataframe[index])
    del dataframe[index]
    return pd.concat([dataframe, expanded], axis=1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.