-
-
Save flesueur/bcb2d9185b64c5191915d860ad19f23f to your computer and use it in GitHub Desktop.
#!/bin/python3 | |
# Largely copied from http://www.mathewinkson.com/2015/03/delete-old-tweets-selectively-using-python-and-tweepy | |
# However, Mathew's script cannot delete tweets older than something like a year (these tweets are not available from the twitter API) | |
# This script is a complement on first use, to delete old tweets. It uses your twitter archive to find tweets' ids to delete | |
# How to use it : | |
# - download and extract your twitter archive (tweet.js will contain all your tweets with dates and ids) | |
# - put this script in the extracted directory | |
# - complete the secrets to access twitter's API on your behalf and, possibly, modify days_to_keep | |
# - delete the few junk characters at the beginning of tweet.js, until the first '[' (it crashed my json parser) | |
# - review the script !!!! It has not been thoroughly tested, it may have some unexpected behaviors... | |
# - run this script | |
# - forget this script, you can now use Mathew's script for your future deletions | |
# | |
# License : Unlicense http://unlicense.org/ | |
import tweepy | |
import json | |
from datetime import datetime, timedelta, timezone | |
consumer_key = '' | |
consumer_secret = '' | |
access_token = '' | |
access_token_secret = '' | |
days_to_keep = 365 | |
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) | |
auth.set_access_token(access_token, access_token_secret) | |
api = tweepy.API(auth) | |
cutoff_date = datetime.now(timezone.utc) - timedelta(days=days_to_keep) | |
print(cutoff_date) | |
fp = open("tweet.js","r") | |
myjson = json.load(fp) | |
for tweet in myjson: | |
d = datetime.strptime(tweet['created_at'], "%a %b %d %H:%M:%S %z %Y") | |
if d < cutoff_date: | |
print(tweet['created_at'] + " " + tweet['id_str']) | |
try: | |
api.destroy_status(tweet['id_str']) | |
except: | |
pass |
I downloaded my twitter archive to try this out. After unarchiving I could not find a "tweet.js" file but under data/js I do see a tweet_index.js file which is a mapping of data/js/tweets which has many files in year_month.js format.
\I tried dumping the contents of each file other than that in the top line until [ just like oringial instructions but its just throwing errors.
myjson = json.load(fp) File "/usr/lib/python3.6/json/__init__.py", line 299, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib/python3.6/json/decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 23 column 2 (char 642)
Anyone else ran into this?
I've got the same problem... I don't know how to fix this.
Ok, I removed this part of the tweet.js file: "window.YTD.tweet.part0 =" and it worked.
I had into the following error on first run:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 53148: character maps to undefined
Solved it by modifying line 37 to include encoding
fp = open("tweet.js","r", encoding='UTF-8')
Thanks for the script. Very useful!
@prasket, if it's helpful to you I updated my fork to account for the new js format: https://gist.github.com/AnilRedshift/536d32b9388675d7c98b019d524983a5
I had the same problem,
Try change line 41 to:
d = datetime.strptime(tweet['created_at'], "%Y-%m-%d %H:%M:%S %z")
Worked for me!