Instantly share code, notes, and snippets.

Embed
What would you like to do?
A script to download all of a user's tweets into a csv
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
#write the csv
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("J_tsar")
@greglinch

This comment has been minimized.

greglinch commented Aug 28, 2013

Thanks for posting this script! Just a heads-up on a minor typo in line 36: "gefore" instead of "before"

https://gist.github.com/yanofsky/5436496#file-tweet_dumper-py-L36

@markwk

This comment has been minimized.

markwk commented Sep 24, 2013

Works great. I'm wondering how I'd do this to get the next 3200 after the initial pull.

@danriz

This comment has been minimized.

danriz commented Oct 17, 2013

I am getting error on windows:

C:>C:\Python26\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

C:>C:\Python275\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

@yanofsky

This comment has been minimized.

Owner

yanofsky commented Nov 1, 2013

@greglinch thanks, fixed!
@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

@Kaorw

This comment has been minimized.

Kaorw commented Dec 28, 2013

Thanks for great code!
I've modified a bit to grap timeline and save to excel format("xls") using xlswriter.

https://gist.github.com/Kaorw/7594044

Thanks again

@jdkram

This comment has been minimized.

jdkram commented Dec 28, 2013

Thanks for the code.

I switched up the final line (after importing sys) to feed in usernames from shell:

get_all_tweets(sys.argv[1])
@hub2git

This comment has been minimized.

hub2git commented Apr 2, 2014

Dear all, I downloaded the py file. I'm running Linux Mint. In terminal, I did:
python tweet_dumper.py

but I got this:
Traceback (most recent call last):
File "tweet_dumper.py", line 4, in
import tweepy #https://github.com/tweepy/tweepy
ImportError: No module named tweepy

What am I doing wrong? What must I do?

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

Should I download the zip file from https://github.com/tweepy/tweepy? I'm so lost, but I want to learn.


UPDATE:
I did
sudo apt-get install python-pip
then
sudo pip install tweepy
.

Then I ran python tweet_dumper.py again. Now I see a csv file! Thanks!!!

@samarthbhargav

This comment has been minimized.

samarthbhargav commented Jul 2, 2014

Fantastic! Thanks!

@tay1orjones

This comment has been minimized.

tay1orjones commented Jul 16, 2014

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

@LifnaJos

This comment has been minimized.

LifnaJos commented Aug 24, 2014

i tried executing the program. there is no error reported.

But no .csv file created.Please help me out

UPDATE : 1

Later it worked.

UPDATE : 2

But now all of a sudden my program show me error as follows and So I repeated all the steps stated by hub2git. Still its not...........Please do help me to trace out

lifna@lifna-Inspiron-N5050:~$ python
Python 2.7.3 (default, Feb 27 2014, 20:00:17)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tweepy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tweepy
exit()

@abhishekmm

This comment has been minimized.

abhishekmm commented Sep 17, 2014

i tried executing it using editrocket[http://editrocket.com/download_win.html]
got following error
File "tweet_dumper.py", line 35
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax

@hub2git

This comment has been minimized.

hub2git commented Nov 11, 2014

Thanks to this script, I succesfully downloaded a user's most recent 3240 tweets.

Line 15 of the script says
* #Twitter only allows access to a users most recent 3240 tweets with this method*

Does anybody know how to download tweets that are older than the 3240th tweet?

@henry-pearce

This comment has been minimized.

henry-pearce commented Nov 27, 2014

I am getting the below, what am I doing wrong? Thanks

File "tweet_dumper.py", line 27, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 230, in _call
return method.execute()
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 203, in execute
raise TweepError(error_msg, resp)
TweepError: [{u'message': u'Bad Authentication data', u'code': 215}]

@yosun

This comment has been minimized.

yosun commented Feb 1, 2015

This seems to only work for tweets from the past year? (For users with more than 3200 tweets)

@sagarjhaa

This comment has been minimized.

sagarjhaa commented Apr 2, 2015

Is that any way we can more than 3200 tweets.....I want all the tweets of a particular user?

@freimanas

This comment has been minimized.

freimanas commented May 27, 2015

Sweet!

have modified to get tweets with images and store to csv:
id, tweet text, image url

just in case anyone else needs as well:
https://gist.github.com/freimanas/39f3ad9a5f0249c0dc64

@ashu2188

This comment has been minimized.

ashu2188 commented Aug 12, 2015

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

@Purptart

This comment has been minimized.

Purptart commented Aug 14, 2015

Hi, I'm using python3.4 and tweepy 3.3.0
I'm getting the following error:

File "dump_tweets.py", line 56, in get_all_tweets
writer.writerows(outtweets)
TypeError: 'str' does not support the buffer interface

This error is also thrown for line 55, but I commented it out in an attempt to debug.

I've tried to just include the text of the tweet which is encoded to utf-8 on line 50, but this still throws the same error.

Does anyone have any hints/suggestions?

EDIT: This appears to only occur on Windows. When running the script from an Ubuntu install it works.

@MihaiTabara

This comment has been minimized.

MihaiTabara commented Aug 22, 2015

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

  • <wait_on_rate_limit> parameter for the api to have it deal with the server
  • use of Cursor to avoid all the # of requests/window reckon

Just in case somebody needs as well, I did a small implementation following the new features here: https://gist.github.com/MihaiTabara/631ecb98f93046a9a454
(mention: I store the tweets in a MongoDB databases instead of csv files)

@Din1993

This comment has been minimized.

Din1993 commented Sep 11, 2015

I am trying to do this for multiple users by including a for loop. Do you know how to have it also print either their name or their screenname? Thanks!

@Sourabh87

This comment has been minimized.

Sourabh87 commented Sep 18, 2015

@ Purptart

Just change the line-
with open('%s_tweets.csv' % screen_name, 'wb', encoding='utf-8') as f:
to
with open('%s_tweets.csv' % screen_name, 'w', encoding='utf-8') as f:

b stands for binary actually. and python 3x versions have modified many things. It works for me fine.

@Sourabh87

This comment has been minimized.

Sourabh87 commented Sep 19, 2015

@Din1993

we can get the screen_name user name and other information as well. Show me how you are trying to do it for multiple users. (Code snippet)

Thanks

@marcogoldin

This comment has been minimized.

marcogoldin commented Oct 5, 2015

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

@Din1993

This comment has been minimized.

Din1993 commented Oct 6, 2015

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

@michellemorales

This comment has been minimized.

michellemorales commented Oct 13, 2015

Hey guys!

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

File "twitterAPI.py", line 118, in
get_all_tweets(user)
File "twitterAPI.py", line 73, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 239, in _call
return method.execute()
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 223, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: Not authorized.

@suraj-deshmukh

This comment has been minimized.

suraj-deshmukh commented Oct 28, 2015

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:
https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

@dowlingmi01

This comment has been minimized.

dowlingmi01 commented Dec 4, 2015

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=1)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
            print "getting tweets before %s" % (oldest)

            #all subsequent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

            #save most recent tweets
            alltweets.extend(new_tweets)

            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1

            print "...%s tweets downloaded so far" % (len(alltweets))

    return alltweets

def store_tweets(alltweets)

MySQL initialization

connection = (host = "",
    user="",
    passwd="",
    db="")
cursor = connection.cursor()

for tweet in alltweets:
    cursor.execute("INSERT INTO twittest (venue, tweet_id, text time, retweet, liked) VALUES 
    (user['screen_name'],tweet['id_str'], tweet['created_at'], tweet['text'], tweet['retweet_count'], tweet['favorite_count'])")    
cursor.close()
connection.commit()

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("KidsAquarium")

@halolimat

This comment has been minimized.

halolimat commented Dec 14, 2015

Thanks!

@phanilav

This comment has been minimized.

phanilav commented Dec 23, 2015

Hi..

Is thr a way to download public tweets 'key word search' older than a week or month. I m able to download public tweets of current week only and not beyond that. Any suggestions appreciated.. Thanks

@alce370

This comment has been minimized.

alce370 commented Feb 10, 2016

This one still works with Python 3.5. I just added the () to each print call, and that in the comment of Sourabh87 (commented on 18 Sep 2015), and it works fine.

@SudhaSompura

This comment has been minimized.

SudhaSompura commented Mar 3, 2016

Is there no way we can crawl ALL the tweets of a particular user and not just 3200 most recent ones..??

@alexgiarolo

This comment has been minimized.

alexgiarolo commented Apr 20, 2016

News about if we can download all tweets?

@DavidNester

This comment has been minimized.

DavidNester commented May 9, 2016

I am somewhat inexperienced with this. Do the 4 lines with the API credentials need to be filled in? If so, where do we get the credentials?


UPDATE:
I figured out my first issue but then ran into this issue when running it:

tweepy.error.TweepError: [{u'message': u'Bad Authentication data.', u'code': 215}]

I only changed the username that was being looked up from the original code

@jdchatham

This comment has been minimized.

jdchatham commented May 10, 2016

Am also having the same problem as @DavidNester. Any updates?

UPDATE:
This actually worked for me/showed me how to get the credentials if you're still looking @DavidNester
http://www.getlaura.com/how-to-download-tweets-from-the-twitter-api/

@DavidNester

This comment has been minimized.

DavidNester commented May 11, 2016

I have the credentials now. I tried running with the other script and I still got the same error.

@analyticascent

This comment has been minimized.

analyticascent commented May 16, 2016

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

@gerardtoko

This comment has been minimized.

gerardtoko commented Jun 15, 2016

Other astuce, recursive function

#!/usr/bin/env python
# encoding: utf-8

import tweepy #https://github.com/tweepy/tweepy
from time import sleep

#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

def get_all_tweets(screen_name, alltweets=[], max_id=0):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #make initial request for most recent tweets (200 is the maximum allowed count)
    if max_id is 0:
        new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    else:
        # new new_tweets
        new_tweets = api.user_timeline(screen_name=screen_name, count= 200, max_id=max_id)

    if len(new_tweets) > 0:
        #save most recent tweets
        alltweets.extend(new_tweets)
        # security
        sleep(2)
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        return get_all_tweets(screen_name=screen_name, alltweets=alltweets, max_id=oldest)

    #final tweets
    return alltweets
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("J_tsar", [], 0)
@gowrik73

This comment has been minimized.

gowrik73 commented Jul 5, 2016

is it possible to extract protected tweets?

@willaccc

This comment has been minimized.

willaccc commented Jul 10, 2016

Hi, i just tried the code it is working well. p.s. I am using Python 3.5

I am thinking of getting multiple users' tweets at same time; another word, having multiple usernames at one time. Any thought on that how should I set it?

Thanks.

@jabbalaci

This comment has been minimized.

jabbalaci commented Jul 16, 2016

Thanks. It returns the shortened URLs. If you need the original URLs, you can use this snippet (Python 3):

from urllib.request import urlopen

def redirect(url):
    page = urlopen(url)
    return page.geturl()

In Python 2: from urllib import urlopen.

@kkhanal18

This comment has been minimized.

kkhanal18 commented Jul 16, 2016

I'm getting this error when I enter someone's username. It works fine for others, but not this username.

tweepy.error.TweepError: [{'code': 34, 'message': 'Sorry, that page does not exist.'}]

@Hunterjet

This comment has been minimized.

Hunterjet commented Aug 9, 2016

For some reason, calling user_timeline directly is much more inefficient than doing it with a cursor, like so:

                cursor = Cursor(self.client.user_timeline,
                                id=user_id,
                                count=200,
                                max_id=max_id).pages(MAX_TIMELINE_PAGES)
                for page in cursor:
                    logging.info('Obtained ' + str(i) + ' tweet pages for user ' + str(user_id) + '.')
                    i += 1
                    for tweet in page:
                        if not hasattr(tweet, 'retweeted_status'):
                            tweets.append(tweet)
                    max_id = page[-1].id - 1

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

@santoshbs

This comment has been minimized.

santoshbs commented Aug 11, 2016

I am getting a syntax error at: print "getting tweets before %s" % (oldest)
Not sure what is wrong. Request your help.

@Greenstan

This comment has been minimized.

Greenstan commented Aug 20, 2016

what if i wanted to save it into a database , would i then need to extract the data from the csv

@adixxov

This comment has been minimized.

adixxov commented Aug 22, 2016

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

with open(fname, 'r') as f: x = f.read() for status in x: code

Appreciate any help...

@Troasi

This comment has been minimized.

Troasi commented Aug 22, 2016

I get error in python 3x as the buffer does not support string. Help me to encode it.

@dev-luis

This comment has been minimized.

dev-luis commented Sep 10, 2016

@santoshbs That's because the script was written for an older version of Python. The new syntax is: print(Your statements).

For the people that have problems running this script, I posted an alternate way to download the tweets using the new syntax on my website: http://luis-programming.com/blog/download_tweets/

I also added an example on how to analyze tweets that are not written using "Latin characters." If you're interested, you can also download the script on my website: http://luis-programming.com/blog/kanji_prj_twitter/

@owlcatz

This comment has been minimized.

owlcatz commented Sep 24, 2016

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

@cbaysan

This comment has been minimized.

cbaysan commented Sep 24, 2016

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

@dhaikney

This comment has been minimized.

dhaikney commented Nov 2, 2016

@yanofsky Found this very useful, thank you!

@abhijith0505

This comment has been minimized.

abhijith0505 commented Nov 5, 2016

I found this article which says that request rate more than 2.5 times the access token rate is achieved. I haven't personally tested this.
Hope it is found useful.

http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./

@ShupingZhang

This comment has been minimized.

ShupingZhang commented Nov 6, 2016

I run the code but it only downloaded 6 tweets (sometimes 206) instead of 3240. Does anyone know the reason? Thanks a lot!

get_all_tweets("City of Toronto")
getting tweets before 616320501871452159
...6 tweets downloaded so far

I'm using Python 2.7.12 Shell.

def get_all_tweets(screen_name):
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_namem = screen_name,count=200,max_id=oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass

@brianhalperin

This comment has been minimized.

brianhalperin commented Dec 6, 2016

Im trying to run the code however keep getting the following error:

Traceback (most recent call last):
File "/Users/Brian/Desktop/get_tweets.py", line 60, in
get_all_tweets("realDonaldTrump")
File "/Users/Brian/Desktop/get_tweets.py", line 52, in get_all_tweets
writer.writerow(["id","created_at","text"])
TypeError: a bytes-like object is required, not 'str'

Anyone know what this could be?

@starkindustries

This comment has been minimized.

starkindustries commented Dec 27, 2016

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

@fatenaught

This comment has been minimized.

fatenaught commented Jan 11, 2017

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

@iLenTheme

This comment has been minimized.

iLenTheme commented Jan 22, 2017

Hello, I get this error, what can it be? http://i.imgur.com/lDRA7uX.png

@Siddhant08

This comment has been minimized.

Siddhant08 commented Jan 25, 2017

@yanofsky The code runs without errors but I can't seem to find the csv file. Where is it created?

@AadityaJ

This comment has been minimized.

AadityaJ commented Jan 26, 2017

+1 Thanks for this script

@gabrielsoule

This comment has been minimized.

gabrielsoule commented Feb 7, 2017

@crishernandezmaps

This comment has been minimized.

crishernandezmaps commented Feb 7, 2017

Thanks for sharing!

@srijan-mishra

This comment has been minimized.

srijan-mishra commented Feb 17, 2017

Can we download more than 3240 tweets?

@Deepak-

This comment has been minimized.

Deepak- commented Feb 27, 2017

Thanks for the script! I do wish there was a way to circumvent the 3240 limit.

@buddiex

This comment has been minimized.

buddiex commented Feb 27, 2017

@deepak same here... having that issue now ... trying to collect historical tweets for a data warehouse project.

@adam-fg

This comment has been minimized.

adam-fg commented Feb 28, 2017

Hi everyone,

I'm after some help - I'm trying to complete some research on Twitter Data for my MSc and this might work, but I have no idea how to use Python.

Would anybody be willing to run this code for me for 3 companies and if this works for hashtags, 3 more hashtags?

Fingers crossed!
Adam

@davidneevel

This comment has been minimized.

davidneevel commented Mar 6, 2017

Thanks! I was just looking to grab a single tweet from one user's timeline and this was the best example of how to do that.

@xdslx

This comment has been minimized.

xdslx commented Mar 29, 2017

is there a way to grab tweets in other languages which uses diferent language codes ? This code only get proper tweets in English. how to change the lang code ? , shortly.

@Faizah36

This comment has been minimized.

Faizah36 commented Apr 15, 2017

how to collect tweets in roman urdu language ? by using python as well as java i am able to get standard english tweets but i want to collect roman urdu tweets for sentiment analysis. please anyone

@lnvrl

This comment has been minimized.

lnvrl commented Apr 15, 2017

I am having the same problem that @iLemTheme in the line 36, says syntax error: invalid syntax

print "getting tweets before %s" % (oldest)

@varpurantala

This comment has been minimized.

varpurantala commented Apr 17, 2017

Hello, can anyonoe help me out with getting tweets for multiple users? I tried: forming a list of users and pass it in the end like this: for item in list: get_all_tweets("list").

@hasan-msh

This comment has been minimized.

hasan-msh commented Apr 18, 2017

the tweets i need to download are non-English language , when i open the output file it shows funny stuff !!
any clues ?

thanks

@carmonantonio

This comment has been minimized.

carmonantonio commented Apr 23, 2017

@Invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functio
print (getting tweets before %s" % (oldest))

@shivkumarkondi

This comment has been minimized.

shivkumarkondi commented May 23, 2017

its giving most recent 3200 tweets . so what is the way to get older tweets than that? please post or let me know on my email : kumarkondi@gmail.com

@jonhilgart22

This comment has been minimized.

jonhilgart22 commented Jun 5, 2017

Great code!

I edited it for Python 3.x. Also, I removed the URLs and the RTs from the user.

def get_all_tweets(screen_name):
"""Download the last 3240 tweets from a user. Do text processign to remove URLs and the retweets from a user.
Adapted from https://gist.github.com/yanofsky/5436496"""
#Twitter only allows access to a users most recent 3240 tweets with this method

#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(credentials['twitter']['consumer_key'], credentials['twitter']['consumer_secret'],)
auth.set_access_token(credentials['twitter']['token'], credentials['twitter']['token_secret'])
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []	

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
	print ("getting tweets before %s" % (oldest))
	
	#all subsiquent requests use the max_id param to prevent duplicates
	new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
	
	#save most recent tweets
	alltweets.extend(new_tweets)
	
	#update the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	
	print ("...%s tweets downloaded so far" % (len(alltweets)))
cleaned_text = [re.sub(r'http[s]?:\/\/.*[\W]*', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
cleaned_text = [re.sub(r'@[\w]*', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions 
cleaned_text = [re.sub(r'RT.*','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets
#transform the tweepy tweets into a 2D array that will populate the csv	
outtweets = [[tweet.id_str, tweet.created_at, cleaned_text[idx].encode("utf-8")] for idx,tweet in enumerate(alltweets)]

#write the csv	
with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
	writer = csv.writer(f)
	writer.writerow(["id","created_at","text"])
	writer.writerows(outtweets)

pass
@arturoaviles

This comment has been minimized.

arturoaviles commented Jun 14, 2017

If I run this 16 times in less than 15 minutes, will the API stop answering? Thanks

@rs2283

This comment has been minimized.

rs2283 commented Jun 23, 2017

I need to extract tweets from twitter for a specific hashtag for last ten years.Can anyone please help me in providing the code in R for the same.

@santiag080

This comment has been minimized.

santiag080 commented Jun 28, 2017

i work with a similar code, with the code that i use i can input the username as i download the timeline directly without having to edit the code itself.... but the output format it's unreadable.... so, is there any way of making this code into a macro? like with an excel table put in a bunch of user and download every timeline???

@santiag080

This comment has been minimized.

santiag080 commented Jun 28, 2017

oh! this is the code i used before but doesn't work :/ as i said before the output format is unreadable... any ideas??

`import sys
import csv
import json
from datetime import datetime, date, timedelta
import time
import os
import twitter
import smtplib
import collections
from random import shuffle
from urllib2 import URLError
import signal
import atexit
import logging
import re
import argparse
import StringIO, traceback,string
import csv, codecs, cStringIO
from ConfigParser import SafeConfigParser

def t():
configParser = SafeConfigParser()
configFilePath = 'C:\config.txt'
configParser.read(configFilePath)
with codecs.open(configFilePath, 'r', encoding='utf-8') as f:
configParser.readfp(f)

CONSUMER_KEY = configParser.get('file', 'CONSUMER_KEY')
CONSUMER_SECRET = configParser.get('file', 'CONSUMER_SECRET')
APP_NAME = configParser.get('file', 'APP_NAME')

TOKEN_FILE = 'out/twitter.oauth'
try:
    (oauth_token, oauth_token_secret) = read_token_file(TOKEN_FILE)
except IOError, e:
    (oauth_token, oauth_token_secret) = oauth_dance(APP_NAME, CONSUMER_KEY,
            CONSUMER_SECRET)
    if not os.path.isdir('out'):
        os.mkdir('out')
    write_token_file(TOKEN_FILE, oauth_token, oauth_token_secret)
return twitter.Twitter(domain='api.twitter.com', api_version='1.1',
                    auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret,
                    CONSUMER_KEY, CONSUMER_SECRET))

def makeTwitterRequest(t, twitterFunction, max_errors=3, *args, **kwArgs):
wait_period = 2
error_count = 0
while True:
try:
return twitterFunction(*args, **kwArgs)
except twitter.api.TwitterHTTPError, e:
error_count = 0
wait_period = handleTwitterHTTPError(e, t, wait_period)
if wait_period is None:
return
except URLError, e:
error_count += 1
print >> sys.stderr, "URLError encountered. Continuing."
if error_count > max_errors:
print >> sys.stderr, "Too many consecutive errors...bailing out."
errorEmail ()
raise
def _getRemainingHits(t, resource_family):
remaining_hits = t.application.rate_limit_status()[u'resources'][u'search'][resource_family]
return remaining_hits
def handleTwitterHTTPError(e, t, wait_period=2):
if wait_period > 3600: # Seconds
print >> sys.stderr, 'Too many retries. Quitting.'
return None
wait_variable = int(datetime.now().strftime("%Y")[:2])
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
elif e.e.code in (404, 34):
print >> sys.stderr, 'Encountered 404 Error (pagina no encontrada)'
return None
elif e.e.code in (502, 503):
print >> sys.stderr, 'Encountered %i Error. Will retry in %i seconds' % (e.e.code,
wait_period)
time.sleep(wait_period)
wait_period *= 1.5
return wait_period
elif _getRemainingHits(t, u'/search/tweets')['remaining'] == 0:
status = _getRemainingHits(t, u'/search/tweets')['reset']
now = time.time()
rate_limit = status+wait_variable-now
sleep_time = max(900, rate_limit, 5) # Prevent negative numbers
print >> sys.stderr, 'Rate limit reached: sleeping for %i secs' % (rate_limit, )
time.sleep(sleep_time)
return 2
else:
raise e
def makeTwitterSearch (t, sts, salida,maximo):
cant_total = 0
#print "call inicial"
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, count=200)
#print response
if response is not None and len(response) > 0:
##lista tempral para almacenar los ids de la respuesta
temp_id_list = []
rta = response
for tweet in rta:
salida.write(str(tweet).replace('\r\n', '').replace('\n','').replace('\r','') + '\n')
temp_id_list.append(tweet['id'])
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print "cant = %s" % cantidad
cont = 1
while cantidad:
temp_id_list = []
print "Call %s " % (cont)
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, max_id = max_id, count=200)
rta = response
for tweet in rta:
salida.write(str(tweet) + '\n')
temp_id_list.append(tweet['id'])
if max_id == min(temp_id_list):
print "Finished! Thanks for searching with us today!"
break
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print cantidad * cont
print cantidad * cont
if maximo <> '':
if int(cantidad * cont)>= int(maximo):
break
print "cantidad encontrada = %s" % cantidad
cont += 1
print "Finalmente devolvemos %s tweets" % cant_total
return None
def normalize(archivo):
normalizations = {
'norm_search': collections.OrderedDict([
('Tweet ID',('xpath_get','id')),
('Tipo',('get_tweet_type', )),
('Retweet ID',('xpath_get','retweeted_status/id')),
('Retweet username',('xpath_get','retweeted_status/user/screen_name')),
('Retweet Count',('get_count','rts')), # sobre el rt si el tipo es RT
('Favorite Count',('get_count','favs')), # sobre el rt si el tipo es RT
('Text',('xpath_get','text')),
('Tweet_Lang',('xpath_get','lang')),
('Fecha',('format_date','created_at')),
('Source',('xpath_get','source')),
('User_username',('xpath_get','user/screen_name')),
('User_ID',('xpath_get','user/id')),
('User_tweet count',('xpath_get','user/statuses_count')),
('User_followers',('xpath_get','user/followers_count')),
('User_followings',('xpath_get','user/friends_count')),
('User_time zone',('xpath_get','user/time_zone')),
('User_language',('xpath_get','user/lang')),
('Location',('xpath_get','user/location')),
('User_create date',('format_date','user/created_at')),
('Mention1',('get_entities','mention',1)),
('Mention2',('get_entities','mention',2)),
('Mention3',('get_entities','mention',3)),
('Link1',('get_entities','link',1)),
('Link2',('get_entities','link',2)),
('Hashtag1',('get_entities','hashtag',1)),
('Hashtag2',('get_entities','hashtag',2)),
('Hashtag3',('get_entities','hashtag',3)),
('Fecha Timezone',('format_date','created_at',"%Y-%m-%d")),
('Dia Timezone',('format_date','created_at',"%a")),
('Hora Timezone',('format_date','created_at',"%H:00")),
('Corte Hora',('format_date','created_at',"%Y-%m-%d %H")),
('place_country',('xpath_get','place/country')),
('user_favourites_count',('xpath_get','user/favourites_count')),
('user_description',('xpath_get','user/description')),
('retweeted_status_user_favourites_count',('xpath_get', 'retweeted_status/user/favourites_count')),
('retweeted_status_user_listed_count',('xpath_get', 'retweeted_status/user/listed_count')),
('retweeted_status_user_profile_image_url',('xpath_get', 'retweeted_status/user/profile_image_url')),
('retweeted_status_created_at',('format_date','retweeted_status/created_at',"%Y-%m-%d %H")),
])
}
file = open(archivo,'r')
with open("/tmp/%s" %archivo+"_normalizado",'wb') as f_csv:
# write data
for row in file:
print row
row_2 = normalize_row(row, normalizations['norm_search'], None)
for e in row_2.iteritems():
print e
def normalize_row(row,format,timezone):
#pprint.pprint(row)

f = row_formatter(row, timezone)
f_rows = []
for (name, action) in format.iteritems():
    # call the appropiate method of row_formatter
    value = getattr(f, action[0])(*action[1:])
if (not value): value = ""
    if (type(value) != str and type(value) != unicode): 
        value = str(value)
    f_rows.append((name, value))
return collections.OrderedDict(f_rows)

class row_formatter:
def init(self, row, timezone):
self.row = row
self.timezone = timezone

def xpath_get(self, path):
    elem = self.row
    try:
        for x in path.strip("/").split("/"):
            elem = elem.get(x)
    except:
        pass

    return elem

def get_tweet_type(self): 
    if 'retweeted_status' in self.row and self.row['retweeted_status']:
        return "RT"
    #elif 'in_reply_to_user_id' in self.row and self.row['in_reply_to_user_id']: 
       # return "REPLY"
    else:
        return "TWEET" 

def get_count(self, count_type): 
    query = ''
    if self.get_tweet_type() == 'RT':
        query+= 'retweeted_status/'
    if (count_type == 'favs'): 
        query+= 'favorite_count'
    elif (count_type == 'rts'): 
        query+= 'retweet_count'
    else: 
        return None
    return self.xpath_get(query)

def get_text(self):
    if self.get_tweet_type() == 'RT':
        query+= ''

def format_date(self, query, output_format = "%Y-%m-%d %H:%M", timezone = None): 
    if (not timezone): timezone = self.timezone
    date = self.xpath_get(query)
    if (not date): return None
    utc = datetime.strptime(date, '%a %b %d %H:%M:%S +0000 %Y').replace(tzinfo=tz.gettz('UTC'))
    local =  utc.astimezone(tz.gettz(timezone))
    return local.strftime(output_format)

def get_entities(self, e_type, index): 
    matches = []
    if (e_type == 'link'):
        tmp = self.xpath_get('/entities/urls')
        if (tmp):
            matches = [e['expanded_url'] for e in tmp]
    if (e_type == 'mention'):
        tmp = self.xpath_get('/entities/user_mentions')
        if (tmp):
            matches = [e['screen_name'] for e in tmp]
    if (e_type == 'hashtag'):
        tmp = self.xpath_get('/entities/hashtags')
        if (tmp):
            matches = [e['text'] for e in tmp]
    
    if (len(matches) >= index):
        return matches[index - 1]
    
    return None

class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    # Redirect output to a queue
    self.queue = cStringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f
    self.encoder = codecs.getincrementalencoder(encoding)()

def writerow(self, row):
    self.writer.writerow([s.encode("utf-8").replace("\n"," ").replace("\r"," ").replace("\t",'') for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = self.encoder.encode(data)
    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

def writerows(self, rows):
    for row in rows:
        self.writerow(row)

if name == 'main':
t = t()
sts = raw_input("Ingrese usuario:")
maximo = raw_input("Ingrese el maximo de registros:")
ht = raw_input("Nombre de archivo?: ")
f = open(ht, 'w')
#sts = "from:%s OR @%s" % (sts,sts)
print "Buscando %s para %s." % (sts,ht)
makeTwitterSearch(t, sts, f,maximo)
f.close()
#normalize(ht)

`

@santiag080

This comment has been minimized.

santiag080 commented Jun 28, 2017

@jdkram @jdkram!!!! HOW??

@colbybair

This comment has been minimized.

colbybair commented Jun 29, 2017

I'm sure I'm doing something obviously wrong, but I'm getting this error when I try to run the code:

Traceback (most recent call last):
File "tweet_dumper.py", line 64, in
get_all_tweets("J_tsar")
File "tweet_dumper.py", line 18, in get_all_tweets
from tweepy.auth import OAuthHandler
ImportError: No module named auth

Any thoughts on this?

@santiag080

This comment has been minimized.

santiag080 commented Jun 29, 2017

@colbybair

did you put the Twitter API's keys?

@jasserkh

This comment has been minimized.

jasserkh commented Jul 4, 2017

i want to extract tweets for a specific period of time, any one have an idea?? thanks

@dev-luis

This comment has been minimized.

dev-luis commented Aug 3, 2017

@jasserkh You can do it like this:

import time
import tweepy
from datetime import datetime, date

#get current date
currentDate = time.strftime("%x")

year = currentDate [6:8]
month = currentDate [0:2]
day = currentDate [3:5]

#reformat the date values
current_dateStr = "20" + year + "-" + month + "-" + day

#convert string to date
currentDate = datetime.strptime(current_dateStr, "%Y-%m-%d").date()
...
...

for tweet in allTweetsList:
try:
#make sure the tweet is recent
createdAt_str = str(tweet.created_at)
ind = createdAt_str.find(" ")
new_createdAt = createdAt_str[:ind]

    #convert string to date
    createdAt = datetime.strptime(new_createdAt, "%Y-%m-%d").date()

    #compare the dates
    if createdAt == currentDate:
        #do something

except tweepy.TweepError as e:
    print(e.response)

If you have questions, please reply to me: http://luis-programming.com/blog/download_tweets/
It's hard to track the replies here.

@tianke0711

This comment has been minimized.

tianke0711 commented Sep 20, 2017

Hi thanks for you code. when I used you code to collect data using python 3
why do the tweet text include characters like: "b" and \xe2\x80\x99s

"b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s ROH Broadcast https://t.co/uIV7TKHs9K'"

Actually in the original tweet is(https://twitter.com/sheezy0): Adam Cole Praises Kevin Owens + A Preview For Next Week’s ROH Broadcast

\xe2\x80\x99s is represent ''s'. I don't know how to solve this issue, I mean I want to get the ''s' in the text. Thanks!

@states-of-fragility

This comment has been minimized.

states-of-fragility commented Sep 29, 2017

Hi! The code works just fine, thanks for sharing.
Yet, I would like to extend the code to retrieve non-english tweets as with this method the arabic letters are translated into funny combinations of roman letters and numbers. I have seen other people asking the same question but so far no answer. Maybe this time it attracts more attention.
Has someone found a solution? I'm a bit desperate.
Merci bien!

Edit: I posted the answer in stack overflow and was able to overcome this issue. In case someone else got stuck with this: https://stackoverflow.com/questions/46510879/saving-arabic-tweets-from-tweepy-in-cvs/46523781?noredirect=1#comment80010395_46523781

@hub2git

This comment has been minimized.

hub2git commented Oct 18, 2017

Hi all. Is there a similar script for downloading all of a CuriousCat.me user's Q&As? For example, https://curiouscat.me/curiouscat.

@pavankthatha

This comment has been minimized.

pavankthatha commented Oct 26, 2017

Posted code works for a given handle, m trying to introduce filters for the tweets, any help would be appreciated.

@sanju9522

This comment has been minimized.

sanju9522 commented Nov 29, 2017

Hi,
I am new to python. Please don't hesitate if my question is very basic.
Is there is a way to run this code for multiple usernames and generate csv files for each username like macros in excel.

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("username1" "username2" "username3" "username4")

Please anyone suggest.
Thanks in advance

@nonamethanks

This comment has been minimized.

nonamethanks commented Dec 8, 2017

@sanju9522:

if name == 'main':
    usernames = ["yourname1", "yourname2"]
    for x in usernames:
        get_all_tweets(x)

You can even use something like usernames.append() combined with raw_input to add usernames at will on input when you launch the script via terminal.

@bsteen

This comment has been minimized.

bsteen commented Dec 13, 2017

Thanks for this. I used the basic framework of your code in my project: https://github.com/bsteen/markov_tweet_generator
I cited you as a source in the "Resources Used" in my README.

@csik

This comment has been minimized.

csik commented Dec 20, 2017

For those looking to download more than just the last 3k-some tweets I found this useful:
https://github.com/bpb27/twitter_scraping

It uses two steps, first Selenium, essentially taking over a browser to get as many tweet IDs as possible by going to each page day by day. I believe this should be possible as well with the API approach above. The second step uses Tweepy to make requests the IDs for metadata.

@pjrudloff

This comment has been minimized.

pjrudloff commented Dec 22, 2017

@yanofsky Thank you for your work. What license applies to the code?

@hammadawan50

This comment has been minimized.

hammadawan50 commented Dec 30, 2017

i am getting following error.
TweepError: Failed to parse JSON payload: Unterminated string starting at: line 1 column 507204 (char 507203)

@4emkay

This comment has been minimized.

4emkay commented Jan 8, 2018

Thank you...Worked Great

@atoponce

This comment has been minimized.

atoponce commented Jan 21, 2018

To support the full text of 280 characters, apply the following patch:

--- /tmp/tweet_dumper.py	2018-01-21 06:07:26.646774539 -0700
+++ tweet_dumper.py	2018-01-21 06:07:20.454724904 -0700
 def get_all_tweets(screen_name):
@@ -23,7 +23,7 @@
 	alltweets = []	
 	
 	#make initial request for most recent tweets (200 is the maximum allowed count)
-	new_tweets = api.user_timeline(screen_name = screen_name,count=200)
+	new_tweets = api.user_timeline(screen_name = screen_name, count=200, tweet_mode='extended')
 	
 	#save most recent tweets
 	alltweets.extend(new_tweets)
@@ -36,7 +36,7 @@
 		print "getting tweets before %s" % (oldest)
 		
 		#all subsiquent requests use the max_id param to prevent duplicates
-		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
+		new_tweets = api.user_timeline(screen_name = screen_name,count=200, tweet_mode='extended', max_id=oldest)
 		
 		#save most recent tweets
 		alltweets.extend(new_tweets)
@@ -47,7 +47,7 @@
 		print "...%s tweets downloaded so far" % (len(alltweets))
 	
 	#transform the tweepy tweets into a 2D array that will populate the csv	
-	outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
+	outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
 	
 	#write the csv	
 	with open('%s_tweets.csv' % screen_name, 'wb') as f:
@@ -60,4 +60,4 @@
 
 if __name__ == '__main__':
 	#pass in the username of the account you want to download
-	get_all_tweets("J_tsar")
+	get_all_tweets("realDonaldTrump")
@AttributeErrorCat

This comment has been minimized.

AttributeErrorCat commented Feb 16, 2018

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

import tweepy
import csv

consumer_key = ''
consumer_secret = "'
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth.secure = True
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=340, include_rts=False)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)

#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=340,max_id=oldest,tweet_mode = 'extended')

#save most recent tweets
alltweets.extend(new_tweets)

#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

print "...%s tweets downloaded so far" % (len(alltweets))

#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]

#write the csv
with open('%s_nyttweet2.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["tweetid","date","text"])
writer.writerows(outtweets)
from urllib import urlopen

pass

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("nytimes")

Hi guys - I'm new to python and I'm trying to stop my tweets from being truncated. I added an extended tweet mode but I get this error.

" AttributeError: 'Status' object has no attribute 'text'"
I can't find where I should change text to full.text.

Also, does anyone know the code to remove the URL from the output too? My tweets look like this:

"The Trump administration released a list of 210 people who were identified because of their "closeness to the Russi… https://t.co/5NmKPtQNrO"
THANK YOU!!!

@kumargouravdas

This comment has been minimized.

kumargouravdas commented Feb 20, 2018

After successfully running the code .I face one problem that is for long tweet i get only a portion of tweet.For eample
One instance of what i get
"Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the r… https://t.co/qG85rbCgIh"
And what actually the tweet is
Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
That means from my output this portion is missing
rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
Have anyone face this problem?
suggest some solution.

@yang-qian

This comment has been minimized.

yang-qian commented Feb 25, 2018

@kumargouravdas
I had the same problem. You can fix it by adding tweet_mode="extended" when calling the user_timeline func. Correspondingly, change tweet.text to tweet.full_text.
references:
https://github.com/sferik/twitter/issues/880
sferik/twitter#880

@SadiaNaseemKhan

This comment has been minimized.

SadiaNaseemKhan commented Mar 13, 2018

I keep getting this error
TweepError: [{'code': 215, 'message': 'Bad Authentication data.'}]
How can i solve this?

@nikitamaloo

This comment has been minimized.

nikitamaloo commented Apr 8, 2018

@atoponce

I am new to python. I used your code to get full tweets of 280 characters. But its showing this error.

Traceback (most recent call last):
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 61, in
get_all_tweets("realDonaldTrump")
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in get_all_tweets
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
TypeError: a bytes-like object is required, not 'str'

@nikitamaloo

This comment has been minimized.

nikitamaloo commented Apr 8, 2018

@yanofsky
Thank you so much!! This code is really useful.
I am using python for the first time. I need to extract tweets for my social media analytics project.

Do you have the code to get more information about each tweet like "how many likes the tweet got" and "how" many times it was retweeted

It would be really helpful I could get that extended code to get to the next step in my project.

@kamalikap

This comment has been minimized.

kamalikap commented May 23, 2018

@nikitamaloo

Don't add replace at the end of line, instead change the last line to ----
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8")] for tweet in alltweets]

This works for me.

@kamalikap

This comment has been minimized.

kamalikap commented May 28, 2018

@yanofsky Thanks for the code. Do you know how can I give the input as a streaming data and not by a particular user and let it track for a period of time?

@kamalikap

This comment has been minimized.

kamalikap commented May 28, 2018

Thanks to @yanofsky and @freimanas for helping with the code.

Here is my code which is modified and contains:

  • full text
    -images
    -hashtags

I hope this can be of some help.

@ausok

This comment has been minimized.

ausok commented Jun 5, 2018

Hey, thanks a lot for the script. It works very well. Just one question: What would I have to do if I wanted to run the script, let's say daily, but only wanted to get the newest tweets I have not saved already? Thanks for any help.

@sunalit

This comment has been minimized.

sunalit commented Jun 11, 2018

Thanks a lot for the script. It works very well. But how can I include other elements like 'sizeof', author', 'contributors', 'coordinates', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'retweet', 'retweet_count', 'retweeted', 'retweets', 'source'. I need this data as well for the analysis. Thanks for help.

@m-ueberall

This comment has been minimized.

m-ueberall commented Jul 4, 2018

FYI: Using tweepy 3.6.0, I saw that retweets are not yet retrieved in complete form (i.e., up to 280 characters) even after applying the patch by @atoponce. Seems to be a library problem, though.

@coolmechel

This comment has been minimized.

coolmechel commented Jul 5, 2018

@jonhilgart22 Interesting code. am new here and will love to explore, am using Jupiter notebook on my local machine
I ran into the following error, Will very much appreciate aheads-up

getting tweets before 276674512769150975
...98 tweets downloaded so far

NameError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("coolmechel")

in get_all_tweets(screen_name)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

in (.0)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

NameError: name 're' is not defined

@m-ueberall

This comment has been minimized.

m-ueberall commented Jul 6, 2018

@coolmechel: You're using regular expression operations without having imported the required module (i.e., "import re").
Have a look at https://docs.python.org/2/library/re.html or https://docs.python.org/3/library/re.html

@nicknazari

This comment has been minimized.

nicknazari commented Jul 6, 2018

Hello, when I run the code I get a permissions error: PermissionError: [Errno 13] Permission denied: 'realDonaldTrump_tweets.csv'

I tried the many StackOverflow solutions to this issue and I am still unable to write to any files. Is anyone aware of a possible fix?

@coolmechel

This comment has been minimized.

coolmechel commented Jul 8, 2018

@m-ueberall.
Thank you very much for responding to my questions, I totally forgot to import that module. However, I ran into another problem.

FileNotFoundError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("BigDataGal")

in get_all_tweets(screen_name)
46
47 #write the csv
---> 48 with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
49 writer = csv.writer(f)
50 writer.writerow(["id","created_at","text"])

FileNotFoundError: [Errno 2] No such file or directory: '../data/raw/svb_founders/BigDataGal_tweets.csv'

I will be glade to resolve this. thanks

@Smoops

This comment has been minimized.

Smoops commented Jul 11, 2018

Is there a script to download tweets by searching on (smaller) sentences, because there is no hashtag or key word?

@rraadd88

This comment has been minimized.

rraadd88 commented Jul 30, 2018

Works.

@IWorkWonders

This comment has been minimized.

IWorkWonders commented Aug 27, 2018

I am wondering how did you guys get consumer key and authentication secret keys working? I have applied for it since ages and have not received yet...is there any way to bypass that stage? i am in real need for this...can someone pls send me the required keys if you have spare ones? It would be helpful for me to get my project going...

@Hanfly

This comment has been minimized.

Hanfly commented Nov 9, 2018

thanks,
but can we get all images in one tweet ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment