Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
A script to download all of a user's tweets into a csv
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
#write the csv
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("J_tsar")

Thanks for posting this script! Just a heads-up on a minor typo in line 36: "gefore" instead of "before"

https://gist.github.com/yanofsky/5436496#file-tweet_dumper-py-L36

markwk commented Sep 24, 2013

Works great. I'm wondering how I'd do this to get the next 3200 after the initial pull.

danriz commented Oct 17, 2013

I am getting error on windows:

C:>C:\Python26\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

C:>C:\Python275\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

Owner

yanofsky commented Nov 1, 2013

@greglinch thanks, fixed!
@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

Kaorw commented Dec 28, 2013

Thanks for great code!
I've modified a bit to grap timeline and save to excel format("xls") using xlswriter.

https://gist.github.com/Kaorw/7594044

Thanks again

jdkram commented Dec 28, 2013

Thanks for the code.

I switched up the final line (after importing sys) to feed in usernames from shell:

get_all_tweets(sys.argv[1])

hub2git commented Apr 2, 2014

Dear all, I downloaded the py file. I'm running Linux Mint. In terminal, I did:
python tweet_dumper.py

but I got this:
Traceback (most recent call last):
File "tweet_dumper.py", line 4, in
import tweepy #https://github.com/tweepy/tweepy
ImportError: No module named tweepy

What am I doing wrong? What must I do?

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

Should I download the zip file from https://github.com/tweepy/tweepy? I'm so lost, but I want to learn.


UPDATE:
I did
sudo apt-get install python-pip
then
sudo pip install tweepy
.

Then I ran python tweet_dumper.py again. Now I see a csv file! Thanks!!!

Fantastic! Thanks!

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

i tried executing the program. there is no error reported.

But no .csv file created.Please help me out

UPDATE : 1

Later it worked.

UPDATE : 2

But now all of a sudden my program show me error as follows and So I repeated all the steps stated by hub2git. Still its not...........Please do help me to trace out

lifna@lifna-Inspiron-N5050:~$ python
Python 2.7.3 (default, Feb 27 2014, 20:00:17)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tweepy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tweepy
exit()

i tried executing it using editrocket[http://editrocket.com/download_win.html]
got following error
File "tweet_dumper.py", line 35
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax

hub2git commented Nov 11, 2014

Thanks to this script, I succesfully downloaded a user's most recent 3240 tweets.

Line 15 of the script says
* #Twitter only allows access to a users most recent 3240 tweets with this method*

Does anybody know how to download tweets that are older than the 3240th tweet?

I am getting the below, what am I doing wrong? Thanks

File "tweet_dumper.py", line 27, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 230, in _call
return method.execute()
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 203, in execute
raise TweepError(error_msg, resp)
TweepError: [{u'message': u'Bad Authentication data', u'code': 215}]

yosun commented Feb 1, 2015

This seems to only work for tweets from the past year? (For users with more than 3200 tweets)

Is that any way we can more than 3200 tweets.....I want all the tweets of a particular user?

Sweet!

have modified to get tweets with images and store to csv:
id, tweet text, image url

just in case anyone else needs as well:
https://gist.github.com/freimanas/39f3ad9a5f0249c0dc64

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

Hi, I'm using python3.4 and tweepy 3.3.0
I'm getting the following error:

File "dump_tweets.py", line 56, in get_all_tweets
writer.writerows(outtweets)
TypeError: 'str' does not support the buffer interface

This error is also thrown for line 55, but I commented it out in an attempt to debug.

I've tried to just include the text of the tweet which is encoded to utf-8 on line 50, but this still throws the same error.

Does anyone have any hints/suggestions?

EDIT: This appears to only occur on Windows. When running the script from an Ubuntu install it works.

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

  • <wait_on_rate_limit> parameter for the api to have it deal with the server
  • use of Cursor to avoid all the # of requests/window reckon

Just in case somebody needs as well, I did a small implementation following the new features here: https://gist.github.com/MihaiTabara/631ecb98f93046a9a454
(mention: I store the tweets in a MongoDB databases instead of csv files)

Din1993 commented Sep 11, 2015

I am trying to do this for multiple users by including a for loop. Do you know how to have it also print either their name or their screenname? Thanks!

@ Purptart

Just change the line-
with open('%s_tweets.csv' % screen_name, 'wb', encoding='utf-8') as f:
to
with open('%s_tweets.csv' % screen_name, 'w', encoding='utf-8') as f:

b stands for binary actually. and python 3x versions have modified many things. It works for me fine.

@Din1993

we can get the screen_name user name and other information as well. Show me how you are trying to do it for multiple users. (Code snippet)

Thanks

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

Din1993 commented Oct 6, 2015

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

Hey guys!

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

File "twitterAPI.py", line 118, in
get_all_tweets(user)
File "twitterAPI.py", line 73, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 239, in _call
return method.execute()
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 223, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: Not authorized.

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:
https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=1)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
            print "getting tweets before %s" % (oldest)

            #all subsequent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

            #save most recent tweets
            alltweets.extend(new_tweets)

            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1

            print "...%s tweets downloaded so far" % (len(alltweets))

    return alltweets

def store_tweets(alltweets)

MySQL initialization

connection = (host = "",
    user="",
    passwd="",
    db="")
cursor = connection.cursor()

for tweet in alltweets:
    cursor.execute("INSERT INTO twittest (venue, tweet_id, text time, retweet, liked) VALUES 
    (user['screen_name'],tweet['id_str'], tweet['created_at'], tweet['text'], tweet['retweet_count'], tweet['favorite_count'])")    
cursor.close()
connection.commit()

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("KidsAquarium")

Thanks!

Hi..

Is thr a way to download public tweets 'key word search' older than a week or month. I m able to download public tweets of current week only and not beyond that. Any suggestions appreciated.. Thanks

alce370 commented Feb 10, 2016

This one still works with Python 3.5. I just added the () to each print call, and that in the comment of Sourabh87 (commented on 18 Sep 2015), and it works fine.

Is there no way we can crawl ALL the tweets of a particular user and not just 3200 most recent ones..??

News about if we can download all tweets?

DavidNester commented May 9, 2016 edited

I am somewhat inexperienced with this. Do the 4 lines with the API credentials need to be filled in? If so, where do we get the credentials?


UPDATE:
I figured out my first issue but then ran into this issue when running it:

tweepy.error.TweepError: [{u'message': u'Bad Authentication data.', u'code': 215}]

I only changed the username that was being looked up from the original code

jdchatham commented May 10, 2016 edited

Am also having the same problem as @DavidNester. Any updates?

UPDATE:
This actually worked for me/showed me how to get the credentials if you're still looking @DavidNester
http://www.getlaura.com/how-to-download-tweets-from-the-twitter-api/

I have the credentials now. I tried running with the other script and I still got the same error.

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

gerardtoko commented Jun 15, 2016 edited

Other astuce, recursive function

#!/usr/bin/env python
# encoding: utf-8

import tweepy #https://github.com/tweepy/tweepy
from time import sleep

#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

def get_all_tweets(screen_name, alltweets=[], max_id=0):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #make initial request for most recent tweets (200 is the maximum allowed count)
    if max_id is 0:
        new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    else:
        # new new_tweets
        new_tweets = api.user_timeline(screen_name=screen_name, count= 200, max_id=max_id)

    if len(new_tweets) > 0:
        #save most recent tweets
        alltweets.extend(new_tweets)
        # security
        sleep(2)
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        return get_all_tweets(screen_name=screen_name, alltweets=alltweets, max_id=oldest)

    #final tweets
    return alltweets
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("J_tsar", [], 0)

gowrik73 commented Jul 5, 2016

is it possible to extract protected tweets?

Hi, i just tried the code it is working well. p.s. I am using Python 3.5

I am thinking of getting multiple users' tweets at same time; another word, having multiple usernames at one time. Any thought on that how should I set it?

Thanks.

Thanks. It returns the shortened URLs. If you need the original URLs, you can use this snippet (Python 3):

from urllib.request import urlopen

def redirect(url):
    page = urlopen(url)
    return page.geturl()

In Python 2: from urllib import urlopen.

I'm getting this error when I enter someone's username. It works fine for others, but not this username.

tweepy.error.TweepError: [{'code': 34, 'message': 'Sorry, that page does not exist.'}]

Hunterjet commented Aug 9, 2016 edited

For some reason, calling user_timeline directly is much more inefficient than doing it with a cursor, like so:

                cursor = Cursor(self.client.user_timeline,
                                id=user_id,
                                count=200,
                                max_id=max_id).pages(MAX_TIMELINE_PAGES)
                for page in cursor:
                    logging.info('Obtained ' + str(i) + ' tweet pages for user ' + str(user_id) + '.')
                    i += 1
                    for tweet in page:
                        if not hasattr(tweet, 'retweeted_status'):
                            tweets.append(tweet)
                    max_id = page[-1].id - 1

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

I am getting a syntax error at: print "getting tweets before %s" % (oldest)
Not sure what is wrong. Request your help.

what if i wanted to save it into a database , would i then need to extract the data from the csv

adixxov commented Aug 22, 2016 edited

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

with open(fname, 'r') as f: x = f.read() for status in x: code

Appreciate any help...

Troasi commented Aug 22, 2016

I get error in python 3x as the buffer does not support string. Help me to encode it.

dev-luis commented Sep 10, 2016 edited

@santoshbs That's because the script was written for an older version of Python. The new syntax is: print(Your statements).

For the people that have problems running this script, I posted the new syntax on my website
http://luis-programming.com/blog/kanji_prj_twitter/jp_tweets_python.html

I also added an example on how to analyze tweets that are not written using "Latin characters." If you're interested, you can also download the script on my website.

owlcatz commented Sep 24, 2016

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

cbaysan commented Sep 24, 2016

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

dhaikney commented Nov 2, 2016

@yanofsky Found this very useful, thank you!

I found this article which says that request rate more than 2.5 times the access token rate is achieved. I haven't personally tested this.
Hope it is found useful.

http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./

ShupingZhang commented Nov 6, 2016 edited

I run the code but it only downloaded 6 tweets (sometimes 206) instead of 3240. Does anyone know the reason? Thanks a lot!

get_all_tweets("City of Toronto")
getting tweets before 616320501871452159
...6 tweets downloaded so far

I'm using Python 2.7.12 Shell.

def get_all_tweets(screen_name):
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_namem = screen_name,count=200,max_id=oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass

Im trying to run the code however keep getting the following error:

Traceback (most recent call last):
File "/Users/Brian/Desktop/get_tweets.py", line 60, in
get_all_tweets("realDonaldTrump")
File "/Users/Brian/Desktop/get_tweets.py", line 52, in get_all_tweets
writer.writerow(["id","created_at","text"])
TypeError: a bytes-like object is required, not 'str'

Anyone know what this could be?

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

Hello, I get this error, what can it be? http://i.imgur.com/lDRA7uX.png

Siddhant08 commented Jan 25, 2017 edited

@yanofsky The code runs without errors but I can't seem to find the csv file. Where is it created?

+1 Thanks for this script

Thanks for sharing!

Can we download more than 3240 tweets?

Deepak- commented Feb 27, 2017

Thanks for the script! I do wish there was a way to circumvent the 3240 limit.

buddiex commented Feb 27, 2017 edited

@deepak same here... having that issue now ... trying to collect historical tweets for a data warehouse project.

adam-fg commented Feb 28, 2017

Hi everyone,

I'm after some help - I'm trying to complete some research on Twitter Data for my MSc and this might work, but I have no idea how to use Python.

Would anybody be willing to run this code for me for 3 companies and if this works for hashtags, 3 more hashtags?

Fingers crossed!
Adam

Thanks! I was just looking to grab a single tweet from one user's timeline and this was the best example of how to do that.

xdslx commented Mar 29, 2017

is there a way to grab tweets in other languages which uses diferent language codes ? This code only get proper tweets in English. how to change the lang code ? , shortly.

how to collect tweets in roman urdu language ? by using python as well as java i am able to get standard english tweets but i want to collect roman urdu tweets for sentiment analysis. please anyone

lnvrl commented Apr 15, 2017

I am having the same problem that @ilemtheme in the line 36, says syntax error: invalid syntax

print "getting tweets before %s" % (oldest)

Hello, can anyonoe help me out with getting tweets for multiple users? I tried: forming a list of users and pass it in the end like this: for item in list: get_all_tweets("list").

the tweets i need to download are non-English language , when i open the output file it shows funny stuff !!
any clues ?

thanks

@invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functio
print (getting tweets before %s" % (oldest))

its giving most recent 3200 tweets . so what is the way to get older tweets than that? please post or let me know on my email : kumarkondi@gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment