Skip to content

Instantly share code, notes, and snippets.

@amirziai
Last active December 8, 2022 18:09
Show Gist options
  • Save amirziai/2808d06f59a38138fa2d to your computer and use it in GitHub Desktop.
Save amirziai/2808d06f59a38138fa2d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@random82
Copy link

Good one, thanks for that @amirziai

@curly-lala
Copy link

Thank you :)

@MatteyRitch
Copy link

MatteyRitch commented Sep 19, 2017

Thank you for this, @amirziai. I tried applying this to a sample object with multiple objects in my json document but with no success.

I know what I'm trying to do is possible. I just don't know how to properly apply this loop to each object in my list of objects:

from pandas.io.json import json_normalize
sample_object = {'Name':'Chris', 'Location':{'City':'Los Angeles','State':'CA'}, 'Hobbies':['Ultimate Lawn Darts', 'Polymorphing']},{'Name':'Matthew', 'Location':{'City':'New York','State':'NY'},'Hobbies':['Ultimate Frisbee', 'Coding']}


def flatten_json(y):
    out = {}
    def obj_loop(y):
        for obj in y:
            def flatten(x, name=''):
                if type(x) is dict:
                    for a in x:
                        flatten(x[a], name + a + '_')
                elif type(x) is list:
                    i = 0
                    for a in x:
                        flatten(a, name + str(i) + '_')
                        i += 1
                else:
                    out[name[:-1]] = x
    obj_loop(y)
    return out;

This doesn't return anything, probably due to a sophomoric mistake but I already spent over 3 hours trying to figure out how to write this and I would really appreciate some guidance.

After I can get this to work I wanted to try applying this to documents where the lists have different amounts of items to see if I could make that work, however, my initial interpretation is that this would not be possible. For instance if I add an extra hobbie to Matt, the first document will only have column headers Hobbies_0 and Hobbies_1 meanwhile Matt would need an additional column header of Hobbies_2. Am I correct in assuming this would not be possible?

Thanks for your help.

Also sorry if the code does not paste properly, this is my first post.

@justinlulejian
Copy link

@amirziai This is awesome, I was trying to flatten a multiple nested JSON object and this helped. Since it's recursive, the level of nesting is limited to how large the python stack is though?

@MatteyRitch
sample_object is interpreted as a tuple of two dictionaries since the two dicts are separated by a comma, but are not enclosed with anything. If you enclose with list brackets '[{dict_a} ,{ dict_b}]' then the flatten function should work and look like this:

{'1_Hobbies_0': 'Ultimate Frisbee', '1_Hobbies_1': 'Coding', '1_Location_City': 'New York', '0_Name': 'Chris', '0_Location_City': 'Los Angeles', '0_Hobbies_1': 'Polymorphing', '0_Hobbies_0': 'Ultimate Lawn Darts', '1_Location_State': 'NY', '0_Location_State': 'CA', '1_Name': 'Matthew'}

Not sure if that's the format you're looking for, but if you adjust the dict for example, add a row_0 and a row_1:
sample_object = {'row_0': {'Name':'Chris', 'Location':{'City':'Los Angeles','State':'CA'}, 'Hobbies':['Ultimate Lawn Darts', 'Polymorphing']}, 'row_1': {'Name':'Matthew', 'Location':{'City':'New York','State':'NY'},'Hobbies':['Ultimate Frisbee', 'Coding']}}

You then get something that looks a little more readable:
{'row_0_Hobbies_0': 'Ultimate Lawn Darts', 'row_0_Location_State': 'CA', 'row_0_Hobbies_1': 'Polymorphing', 'row_1_Location_City': 'New York', 'row_1_Name': 'Matthew', 'row_1_Hobbies_1': 'Coding', 'row_1_Hobbies_0': 'Ultimate Frisbee', 'row_1_Location_State': 'NY', 'row_0_Location_City': 'Los Angeles', 'row_0_Name': 'Chris'}

@XuKeqiang
Copy link

Thank You Very Much!! @amirziai

@jamiebull1
Copy link

Looks really useful. One note though - better to use if isinstance(x, dict): rather than if type(x) is dict: and the same for list in order to also handle any subclasses like OrderedDict.

@johnjbrusk
Copy link

This is very helpful. Can you help me think about how to reverse this using the output of the function... so taking the column naming pattern and serializing back to JSON in the same structure it was flattened from? Thanks again!

@ahim92
Copy link

ahim92 commented Jun 7, 2019

Thank you, have been struggling before I came across this piece of code!

@peopzen
Copy link

peopzen commented Aug 4, 2019

This is a useful tool. Any sample to fold this flatten json? I mean revert dict to json.

@Deepscodes
Copy link

@amirziai This is awesome, I was trying to flatten a multiple nested JSON object and this helped. Since it's recursive, the level of nesting is limited to how large the python stack is though?

@MatteyRitch
sample_object is interpreted as a tuple of two dictionaries since the two dicts are separated by a comma, but are not enclosed with anything. If you enclose with list brackets '[{dict_a} ,{ dict_b}]' then the flatten function should work and look like this:

{'1_Hobbies_0': 'Ultimate Frisbee', '1_Hobbies_1': 'Coding', '1_Location_City': 'New York', '0_Name': 'Chris', '0_Location_City': 'Los Angeles', '0_Hobbies_1': 'Polymorphing', '0_Hobbies_0': 'Ultimate Lawn Darts', '1_Location_State': 'NY', '0_Location_State': 'CA', '1_Name': 'Matthew'}

Not sure if that's the format you're looking for, but if you adjust the dict for example, add a row_0 and a row_1:
sample_object = {'row_0': {'Name':'Chris', 'Location':{'City':'Los Angeles','State':'CA'}, 'Hobbies':['Ultimate Lawn Darts', 'Polymorphing']}, 'row_1': {'Name':'Matthew', 'Location':{'City':'New York','State':'NY'},'Hobbies':['Ultimate Frisbee', 'Coding']}}

You then get something that looks a little more readable:
{'row_0_Hobbies_0': 'Ultimate Lawn Darts', 'row_0_Location_State': 'CA', 'row_0_Hobbies_1': 'Polymorphing', 'row_1_Location_City': 'New York', 'row_1_Name': 'Matthew', 'row_1_Hobbies_1': 'Coding', 'row_1_Hobbies_0': 'Ultimate Frisbee', 'row_1_Location_State': 'NY', 'row_0_Location_City': 'Los Angeles', 'row_0_Name': 'Chris'}

I am trying to create a table on top this data set, since everything is coming in one big row , Is there a create a table from this data set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment