-
-
Save AndiH/4d4ef85e2dec395a0ae5343c648565eb to your computer and use it in GitHub Desktop.
{ | |
"page": 1, | |
"pages": 2270, | |
"limit": 10, | |
"total": 22693, | |
"items": [ | |
{ | |
"address": { | |
"city": "cityname first dataset", | |
"company_name": "companyname first dataset" | |
}, | |
"amount": 998, | |
"items": [ | |
{ | |
"description": "first part of first dataset", | |
"number": "part number of first part of first dataset" | |
} | |
], | |
"number": "number of first dataset", | |
"service_date": { | |
"type": "DEFAULT", | |
"date": "2015-11-18" | |
}, | |
"vat_option": null | |
}, | |
{ | |
"address": { | |
"city": "cityname second dataset", | |
"company_name": "companyname second dataset" | |
}, | |
"amount": 998, | |
"items": [ | |
{ | |
"description": "first part of second dataset", | |
"number": "part number of first part of second dataset" | |
}, | |
{ | |
"description": "second part of second dataset", | |
"number": "part number of second part of second dataset" | |
} | |
], | |
"number": "number of second dataset", | |
"service_date": { | |
"type": "DEFAULT", | |
"date": "2015-11-18" | |
}, | |
"vat_option": null | |
} | |
] | |
} |
item_address_city | item_address_company_name | items_amount | ||
---|---|---|---|---|
0 | cityname first dataset | companyname first dataset | 998 | |
1 | cityname second dataset | companyname second dataset | 998 |
This here is a good start when using pandas, showing the flattening of arrays: https://medium.com/towards-data-science/flattening-json-objects-in-python-f5343c794b10
The result shown there after normalization is, that the hobbies are put each to an own column.
OK so far - got that.
I'd like to have only one column "hobbies" and then create new lines:
https://www.dropbox.com/s/nm24xqgvdrslkl0/john.jpg?dl=0
The this is, that the information like "John" or "Los Angeles" need to be copied to the next line then.
Whoooho... think I got it: http://pandas.pydata.org/pandas-docs/stable/io.html#normalization
Now adapting it to my data set... keep you posted 💃
Okay, so far so good - in general, it is working, currently I have a slight problem with a "distinguishing prefix" - i think because I am in conflict with 'numbers' - seems to be used in pandas itself?
json_normalize is a great way to "massage" the json data - great tip using pandas!
Now also with working code example... :D
Thank you Andi! The loop helps a lot and will collect the data like I need it.
The dictionaries like address->city and address->company are now clear to me.
The problem now is the multiple item arrays like:
If I include that into the loop like:
"items_items": element["items"]["description"]
Of course that doesn’t work, because the whole "items": [ ] set is a string.