Skip to content

Instantly share code, notes, and snippets.

@unkwn1-repo
Last active August 3, 2023 18:19
Show Gist options
  • Save unkwn1-repo/b0f2d4ea6bdd770e5e9e94d54154c751 to your computer and use it in GitHub Desktop.
Save unkwn1-repo/b0f2d4ea6bdd770e5e9e94d54154c751 to your computer and use it in GitHub Desktop.
Simple JSON to CSV Method for Twitter Archive Data
#!/usr/bin/env python3
'''
IMPORTANT: Please delete the following from the tweet.js file before using this:
---> window.YTD.tweet.part0 = <----
Whilst that remains you cant parse it easily with json.load
'''
import pandas as pd
import json
from pandas.io.json import json_normalize
# process raw json
data = json.load(open('tweet.js'))
# Create DataFrame
df = pd.DataFrame.from_records(json_normalize(data))
# Write to CSV
df.to_csv("tweets.csv")
@unkwn1-repo
Copy link
Author

re-wrote as the dataframe append program flow I previously used was inefficient to say the least - eating both memory / CPU and taking roughly 13 minutes to create.

The new method is near instant for just over 20k rows! :)

@bjpcjp
Copy link

bjpcjp commented Feb 21, 2022

Worked for me out of the box. TY!

@unkwn1-repo
Copy link
Author

Worked for me out of the box. TY!

Awesome! Ty for the reply it's awesome to see someone else found it useful

@ejfox
Copy link

ejfox commented Aug 3, 2023

Very handy- had to tweak a bit with python 3 on OS X 13.4.1

#!/usr/bin/env python3
'''
    IMPORTANT: Please delete the following from the tweet.js file before using this:
     ---> window.YTD.tweet.part0 = <----
    Whilst that remains you cant parse it easily with json.load
'''
import pandas as pd
import json

# process raw json
data = json.load(open('tweets.js'))
# Create DataFrame
df = pd.DataFrame.from_records(pd.json_normalize(data))
# Write to CSV
df.to_csv("tweets.csv")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment