Skip to content

Instantly share code, notes, and snippets.

@tmcw
Created July 31, 2012 21:31
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save tmcw/3220747 to your computer and use it in GitHub Desktop.
Archive Tweets
import requests, os, glob, json
you = 'tmcw'
data = 'tweets'
try: os.mkdir(data)
except Exception: pass
def run(max_id = False):
already = glob.glob("%s/*.json" % data)
start = 'http://api.twitter.com/1/statuses/user_timeline.json?screen_name=%s&include_rts=true&count=200' % you
if max_id:
start = '%s&max_id=%s' % (start, max_id)
r = requests.get(start)
has_new = False
for t in r.json:
if ("%s/%s.json" % (data, t['id'])) not in already:
json.dump(t, open('%s/%s.json' % (data, t['id']), 'w'))
has_new = True
if has_new:
last = r.json.pop()
run(last['id'])
print 'starting twitter archive of @%s' % you
run()
@cbfrance
Copy link

Thanks for referring me to this Tom.

A minor note, this will not save all of your data, eg: your favorites, users you are following, users who are following you, or avatars, bios, etc. Also (more interestingly) it won't do any spidering to save data eventually needed to meaningfully reconstruct conversations (others' tweets), or embedded media (twitpics in the discussion, or even just preserving links). Are you aware of any other scripts that go this more elaborate route? Any interest in extending this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment