Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A script to download all of a user's tweets into a csv
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
#write the csv
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("J_tsar")
@greglinch

This comment has been minimized.

Show comment
Hide comment
@greglinch

greglinch Aug 28, 2013

Thanks for posting this script! Just a heads-up on a minor typo in line 36: "gefore" instead of "before"

https://gist.github.com/yanofsky/5436496#file-tweet_dumper-py-L36

Thanks for posting this script! Just a heads-up on a minor typo in line 36: "gefore" instead of "before"

https://gist.github.com/yanofsky/5436496#file-tweet_dumper-py-L36

@markwk

This comment has been minimized.

Show comment
Hide comment
@markwk

markwk Sep 24, 2013

Works great. I'm wondering how I'd do this to get the next 3200 after the initial pull.

markwk commented Sep 24, 2013

Works great. I'm wondering how I'd do this to get the next 3200 after the initial pull.

@danriz

This comment has been minimized.

Show comment
Hide comment
@danriz

danriz Oct 17, 2013

I am getting error on windows:

C:>C:\Python26\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

C:>C:\Python275\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

danriz commented Oct 17, 2013

I am getting error on windows:

C:>C:\Python26\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

C:>C:\Python275\python.exe C:\Python26\tweet_dumper.py
File "C:\Python26\tweet_dumper.py", line 17
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
^
IndentationError: expected an indented block

@yanofsky

This comment has been minimized.

Show comment
Hide comment
@yanofsky

yanofsky Nov 1, 2013

@greglinch thanks, fixed!
@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

Owner

yanofsky commented Nov 1, 2013

@greglinch thanks, fixed!
@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

@Kaorw

This comment has been minimized.

Show comment
Hide comment
@Kaorw

Kaorw Dec 28, 2013

Thanks for great code!
I've modified a bit to grap timeline and save to excel format("xls") using xlswriter.

https://gist.github.com/Kaorw/7594044

Thanks again

Kaorw commented Dec 28, 2013

Thanks for great code!
I've modified a bit to grap timeline and save to excel format("xls") using xlswriter.

https://gist.github.com/Kaorw/7594044

Thanks again

@jdkram

This comment has been minimized.

Show comment
Hide comment
@jdkram

jdkram Dec 28, 2013

Thanks for the code.

I switched up the final line (after importing sys) to feed in usernames from shell:

get_all_tweets(sys.argv[1])

jdkram commented Dec 28, 2013

Thanks for the code.

I switched up the final line (after importing sys) to feed in usernames from shell:

get_all_tweets(sys.argv[1])
@hub2git

This comment has been minimized.

Show comment
Hide comment
@hub2git

hub2git Apr 2, 2014

Dear all, I downloaded the py file. I'm running Linux Mint. In terminal, I did:
python tweet_dumper.py

but I got this:
Traceback (most recent call last):
File "tweet_dumper.py", line 4, in
import tweepy #https://github.com/tweepy/tweepy
ImportError: No module named tweepy

What am I doing wrong? What must I do?

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

Should I download the zip file from https://github.com/tweepy/tweepy? I'm so lost, but I want to learn.


UPDATE:
I did
sudo apt-get install python-pip
then
sudo pip install tweepy
.

Then I ran python tweet_dumper.py again. Now I see a csv file! Thanks!!!

hub2git commented Apr 2, 2014

Dear all, I downloaded the py file. I'm running Linux Mint. In terminal, I did:
python tweet_dumper.py

but I got this:
Traceback (most recent call last):
File "tweet_dumper.py", line 4, in
import tweepy #https://github.com/tweepy/tweepy
ImportError: No module named tweepy

What am I doing wrong? What must I do?

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

Should I download the zip file from https://github.com/tweepy/tweepy? I'm so lost, but I want to learn.


UPDATE:
I did
sudo apt-get install python-pip
then
sudo pip install tweepy
.

Then I ran python tweet_dumper.py again. Now I see a csv file! Thanks!!!

@samarthbhargav

This comment has been minimized.

Show comment
Hide comment
@samarthbhargav

samarthbhargav Jul 2, 2014

Fantastic! Thanks!

Fantastic! Thanks!

@tay1orjones

This comment has been minimized.

Show comment
Hide comment
@tay1orjones

tay1orjones Jul 16, 2014

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

@LifnaJos

This comment has been minimized.

Show comment
Hide comment
@LifnaJos

LifnaJos Aug 24, 2014

i tried executing the program. there is no error reported.

But no .csv file created.Please help me out

UPDATE : 1

Later it worked.

UPDATE : 2

But now all of a sudden my program show me error as follows and So I repeated all the steps stated by hub2git. Still its not...........Please do help me to trace out

lifna@lifna-Inspiron-N5050:~$ python
Python 2.7.3 (default, Feb 27 2014, 20:00:17)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tweepy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tweepy
exit()

i tried executing the program. there is no error reported.

But no .csv file created.Please help me out

UPDATE : 1

Later it worked.

UPDATE : 2

But now all of a sudden my program show me error as follows and So I repeated all the steps stated by hub2git. Still its not...........Please do help me to trace out

lifna@lifna-Inspiron-N5050:~$ python
Python 2.7.3 (default, Feb 27 2014, 20:00:17)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tweepy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tweepy
exit()

@abhishekmm

This comment has been minimized.

Show comment
Hide comment
@abhishekmm

abhishekmm Sep 17, 2014

i tried executing it using editrocket[http://editrocket.com/download_win.html]
got following error
File "tweet_dumper.py", line 35
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax

i tried executing it using editrocket[http://editrocket.com/download_win.html]
got following error
File "tweet_dumper.py", line 35
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax

@hub2git

This comment has been minimized.

Show comment
Hide comment
@hub2git

hub2git Nov 11, 2014

Thanks to this script, I succesfully downloaded a user's most recent 3240 tweets.

Line 15 of the script says
* #Twitter only allows access to a users most recent 3240 tweets with this method*

Does anybody know how to download tweets that are older than the 3240th tweet?

hub2git commented Nov 11, 2014

Thanks to this script, I succesfully downloaded a user's most recent 3240 tweets.

Line 15 of the script says
* #Twitter only allows access to a users most recent 3240 tweets with this method*

Does anybody know how to download tweets that are older than the 3240th tweet?

@henry-pearce

This comment has been minimized.

Show comment
Hide comment
@henry-pearce

henry-pearce Nov 27, 2014

I am getting the below, what am I doing wrong? Thanks

File "tweet_dumper.py", line 27, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 230, in _call
return method.execute()
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 203, in execute
raise TweepError(error_msg, resp)
TweepError: [{u'message': u'Bad Authentication data', u'code': 215}]

I am getting the below, what am I doing wrong? Thanks

File "tweet_dumper.py", line 27, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 230, in _call
return method.execute()
File "C:\Python27\lib\site-packages\tweepy-2.3.0-py2.7.egg\tweepy\binder.py", line 203, in execute
raise TweepError(error_msg, resp)
TweepError: [{u'message': u'Bad Authentication data', u'code': 215}]

@yosun

This comment has been minimized.

Show comment
Hide comment
@yosun

yosun Feb 1, 2015

This seems to only work for tweets from the past year? (For users with more than 3200 tweets)

yosun commented Feb 1, 2015

This seems to only work for tweets from the past year? (For users with more than 3200 tweets)

@sagarjhaa

This comment has been minimized.

Show comment
Hide comment
@sagarjhaa

sagarjhaa Apr 2, 2015

Is that any way we can more than 3200 tweets.....I want all the tweets of a particular user?

Is that any way we can more than 3200 tweets.....I want all the tweets of a particular user?

@freimanas

This comment has been minimized.

Show comment
Hide comment
@freimanas

freimanas May 27, 2015

Sweet!

have modified to get tweets with images and store to csv:
id, tweet text, image url

just in case anyone else needs as well:
https://gist.github.com/freimanas/39f3ad9a5f0249c0dc64

Sweet!

have modified to get tweets with images and store to csv:
id, tweet text, image url

just in case anyone else needs as well:
https://gist.github.com/freimanas/39f3ad9a5f0249c0dc64

@ashu2188

This comment has been minimized.

Show comment
Hide comment
@ashu2188

ashu2188 Aug 12, 2015

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

@Purptart

This comment has been minimized.

Show comment
Hide comment
@Purptart

Purptart Aug 14, 2015

Hi, I'm using python3.4 and tweepy 3.3.0
I'm getting the following error:

File "dump_tweets.py", line 56, in get_all_tweets
writer.writerows(outtweets)
TypeError: 'str' does not support the buffer interface

This error is also thrown for line 55, but I commented it out in an attempt to debug.

I've tried to just include the text of the tweet which is encoded to utf-8 on line 50, but this still throws the same error.

Does anyone have any hints/suggestions?

EDIT: This appears to only occur on Windows. When running the script from an Ubuntu install it works.

Hi, I'm using python3.4 and tweepy 3.3.0
I'm getting the following error:

File "dump_tweets.py", line 56, in get_all_tweets
writer.writerows(outtweets)
TypeError: 'str' does not support the buffer interface

This error is also thrown for line 55, but I commented it out in an attempt to debug.

I've tried to just include the text of the tweet which is encoded to utf-8 on line 50, but this still throws the same error.

Does anyone have any hints/suggestions?

EDIT: This appears to only occur on Windows. When running the script from an Ubuntu install it works.

@MihaiTabara

This comment has been minimized.

Show comment
Hide comment
@MihaiTabara

MihaiTabara Aug 22, 2015

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

  • <wait_on_rate_limit> parameter for the api to have it deal with the server
  • use of Cursor to avoid all the # of requests/window reckon

Just in case somebody needs as well, I did a small implementation following the new features here: https://gist.github.com/MihaiTabara/631ecb98f93046a9a454
(mention: I store the tweets in a MongoDB databases instead of csv files)

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

  • <wait_on_rate_limit> parameter for the api to have it deal with the server
  • use of Cursor to avoid all the # of requests/window reckon

Just in case somebody needs as well, I did a small implementation following the new features here: https://gist.github.com/MihaiTabara/631ecb98f93046a9a454
(mention: I store the tweets in a MongoDB databases instead of csv files)

@Din1993

This comment has been minimized.

Show comment
Hide comment
@Din1993

Din1993 Sep 11, 2015

I am trying to do this for multiple users by including a for loop. Do you know how to have it also print either their name or their screenname? Thanks!

Din1993 commented Sep 11, 2015

I am trying to do this for multiple users by including a for loop. Do you know how to have it also print either their name or their screenname? Thanks!

@Sourabh87

This comment has been minimized.

Show comment
Hide comment
@Sourabh87

Sourabh87 Sep 18, 2015

@ Purptart

Just change the line-
with open('%s_tweets.csv' % screen_name, 'wb', encoding='utf-8') as f:
to
with open('%s_tweets.csv' % screen_name, 'w', encoding='utf-8') as f:

b stands for binary actually. and python 3x versions have modified many things. It works for me fine.

@ Purptart

Just change the line-
with open('%s_tweets.csv' % screen_name, 'wb', encoding='utf-8') as f:
to
with open('%s_tweets.csv' % screen_name, 'w', encoding='utf-8') as f:

b stands for binary actually. and python 3x versions have modified many things. It works for me fine.

@Sourabh87

This comment has been minimized.

Show comment
Hide comment
@Sourabh87

Sourabh87 Sep 19, 2015

@Din1993

we can get the screen_name user name and other information as well. Show me how you are trying to do it for multiple users. (Code snippet)

Thanks

@Din1993

we can get the screen_name user name and other information as well. Show me how you are trying to do it for multiple users. (Code snippet)

Thanks

@marcogoldin

This comment has been minimized.

Show comment
Hide comment
@marcogoldin

marcogoldin Oct 5, 2015

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

@Din1993

This comment has been minimized.

Show comment
Hide comment
@Din1993

Din1993 Oct 6, 2015

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

Din1993 commented Oct 6, 2015

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

@michellemorales

This comment has been minimized.

Show comment
Hide comment
@michellemorales

michellemorales Oct 13, 2015

Hey guys!

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

File "twitterAPI.py", line 118, in
get_all_tweets(user)
File "twitterAPI.py", line 73, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 239, in _call
return method.execute()
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 223, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: Not authorized.

Hey guys!

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

File "twitterAPI.py", line 118, in
get_all_tweets(user)
File "twitterAPI.py", line 73, in get_all_tweets
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 239, in _call
return method.execute()
File "//anaconda/lib/python2.7/site-packages/tweepy/binder.py", line 223, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: Not authorized.

@suraj-deshmukh

This comment has been minimized.

Show comment
Hide comment
@suraj-deshmukh

suraj-deshmukh Oct 28, 2015

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:
https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:
https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

@dowlingmi01

This comment has been minimized.

Show comment
Hide comment
@dowlingmi01

dowlingmi01 Dec 4, 2015

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=1)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
            print "getting tweets before %s" % (oldest)

            #all subsequent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

            #save most recent tweets
            alltweets.extend(new_tweets)

            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1

            print "...%s tweets downloaded so far" % (len(alltweets))

    return alltweets

def store_tweets(alltweets)

MySQL initialization

connection = (host = "",
    user="",
    passwd="",
    db="")
cursor = connection.cursor()

for tweet in alltweets:
    cursor.execute("INSERT INTO twittest (venue, tweet_id, text time, retweet, liked) VALUES 
    (user['screen_name'],tweet['id_str'], tweet['created_at'], tweet['text'], tweet['retweet_count'], tweet['favorite_count'])")    
cursor.close()
connection.commit()

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("KidsAquarium")

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=1)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
            print "getting tweets before %s" % (oldest)

            #all subsequent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

            #save most recent tweets
            alltweets.extend(new_tweets)

            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1

            print "...%s tweets downloaded so far" % (len(alltweets))

    return alltweets

def store_tweets(alltweets)

MySQL initialization

connection = (host = "",
    user="",
    passwd="",
    db="")
cursor = connection.cursor()

for tweet in alltweets:
    cursor.execute("INSERT INTO twittest (venue, tweet_id, text time, retweet, liked) VALUES 
    (user['screen_name'],tweet['id_str'], tweet['created_at'], tweet['text'], tweet['retweet_count'], tweet['favorite_count'])")    
cursor.close()
connection.commit()

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("KidsAquarium")

@halolimat

This comment has been minimized.

Show comment
Hide comment

Thanks!

@phanilav

This comment has been minimized.

Show comment
Hide comment
@phanilav

phanilav Dec 23, 2015

Hi..

Is thr a way to download public tweets 'key word search' older than a week or month. I m able to download public tweets of current week only and not beyond that. Any suggestions appreciated.. Thanks

Hi..

Is thr a way to download public tweets 'key word search' older than a week or month. I m able to download public tweets of current week only and not beyond that. Any suggestions appreciated.. Thanks

@alce370

This comment has been minimized.

Show comment
Hide comment
@alce370

alce370 Feb 10, 2016

This one still works with Python 3.5. I just added the () to each print call, and that in the comment of Sourabh87 (commented on 18 Sep 2015), and it works fine.

alce370 commented Feb 10, 2016

This one still works with Python 3.5. I just added the () to each print call, and that in the comment of Sourabh87 (commented on 18 Sep 2015), and it works fine.

@SudhaSompura

This comment has been minimized.

Show comment
Hide comment
@SudhaSompura

SudhaSompura Mar 3, 2016

Is there no way we can crawl ALL the tweets of a particular user and not just 3200 most recent ones..??

Is there no way we can crawl ALL the tweets of a particular user and not just 3200 most recent ones..??

@alexgiarolo

This comment has been minimized.

Show comment
Hide comment
@alexgiarolo

alexgiarolo Apr 20, 2016

News about if we can download all tweets?

News about if we can download all tweets?

@DavidNester

This comment has been minimized.

Show comment
Hide comment
@DavidNester

DavidNester May 9, 2016

I am somewhat inexperienced with this. Do the 4 lines with the API credentials need to be filled in? If so, where do we get the credentials?


UPDATE:
I figured out my first issue but then ran into this issue when running it:

tweepy.error.TweepError: [{u'message': u'Bad Authentication data.', u'code': 215}]

I only changed the username that was being looked up from the original code

DavidNester commented May 9, 2016

I am somewhat inexperienced with this. Do the 4 lines with the API credentials need to be filled in? If so, where do we get the credentials?


UPDATE:
I figured out my first issue but then ran into this issue when running it:

tweepy.error.TweepError: [{u'message': u'Bad Authentication data.', u'code': 215}]

I only changed the username that was being looked up from the original code

@jdchatham

This comment has been minimized.

Show comment
Hide comment
@jdchatham

jdchatham May 10, 2016

Am also having the same problem as @DavidNester. Any updates?

UPDATE:
This actually worked for me/showed me how to get the credentials if you're still looking @DavidNester
http://www.getlaura.com/how-to-download-tweets-from-the-twitter-api/

jdchatham commented May 10, 2016

Am also having the same problem as @DavidNester. Any updates?

UPDATE:
This actually worked for me/showed me how to get the credentials if you're still looking @DavidNester
http://www.getlaura.com/how-to-download-tweets-from-the-twitter-api/

@DavidNester

This comment has been minimized.

Show comment
Hide comment
@DavidNester

DavidNester May 11, 2016

I have the credentials now. I tried running with the other script and I still got the same error.

I have the credentials now. I tried running with the other script and I still got the same error.

@analyticascent

This comment has been minimized.

Show comment
Hide comment
@analyticascent

analyticascent May 16, 2016

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

@gerardtoko

This comment has been minimized.

Show comment
Hide comment
@gerardtoko

gerardtoko Jun 15, 2016

Other astuce, recursive function

#!/usr/bin/env python
# encoding: utf-8

import tweepy #https://github.com/tweepy/tweepy
from time import sleep

#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

def get_all_tweets(screen_name, alltweets=[], max_id=0):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #make initial request for most recent tweets (200 is the maximum allowed count)
    if max_id is 0:
        new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    else:
        # new new_tweets
        new_tweets = api.user_timeline(screen_name=screen_name, count= 200, max_id=max_id)

    if len(new_tweets) > 0:
        #save most recent tweets
        alltweets.extend(new_tweets)
        # security
        sleep(2)
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        return get_all_tweets(screen_name=screen_name, alltweets=alltweets, max_id=oldest)

    #final tweets
    return alltweets
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("J_tsar", [], 0)

gerardtoko commented Jun 15, 2016

Other astuce, recursive function

#!/usr/bin/env python
# encoding: utf-8

import tweepy #https://github.com/tweepy/tweepy
from time import sleep

#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

def get_all_tweets(screen_name, alltweets=[], max_id=0):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #make initial request for most recent tweets (200 is the maximum allowed count)
    if max_id is 0:
        new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    else:
        # new new_tweets
        new_tweets = api.user_timeline(screen_name=screen_name, count= 200, max_id=max_id)

    if len(new_tweets) > 0:
        #save most recent tweets
        alltweets.extend(new_tweets)
        # security
        sleep(2)
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        return get_all_tweets(screen_name=screen_name, alltweets=alltweets, max_id=oldest)

    #final tweets
    return alltweets
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("J_tsar", [], 0)
@gowrik73

This comment has been minimized.

Show comment
Hide comment
@gowrik73

gowrik73 Jul 5, 2016

is it possible to extract protected tweets?

gowrik73 commented Jul 5, 2016

is it possible to extract protected tweets?

@willaccc

This comment has been minimized.

Show comment
Hide comment
@willaccc

willaccc Jul 10, 2016

Hi, i just tried the code it is working well. p.s. I am using Python 3.5

I am thinking of getting multiple users' tweets at same time; another word, having multiple usernames at one time. Any thought on that how should I set it?

Thanks.

Hi, i just tried the code it is working well. p.s. I am using Python 3.5

I am thinking of getting multiple users' tweets at same time; another word, having multiple usernames at one time. Any thought on that how should I set it?

Thanks.

@jabbalaci

This comment has been minimized.

Show comment
Hide comment
@jabbalaci

jabbalaci Jul 16, 2016

Thanks. It returns the shortened URLs. If you need the original URLs, you can use this snippet (Python 3):

from urllib.request import urlopen

def redirect(url):
    page = urlopen(url)
    return page.geturl()

In Python 2: from urllib import urlopen.

Thanks. It returns the shortened URLs. If you need the original URLs, you can use this snippet (Python 3):

from urllib.request import urlopen

def redirect(url):
    page = urlopen(url)
    return page.geturl()

In Python 2: from urllib import urlopen.

@kkhanal18

This comment has been minimized.

Show comment
Hide comment
@kkhanal18

kkhanal18 Jul 16, 2016

I'm getting this error when I enter someone's username. It works fine for others, but not this username.

tweepy.error.TweepError: [{'code': 34, 'message': 'Sorry, that page does not exist.'}]

I'm getting this error when I enter someone's username. It works fine for others, but not this username.

tweepy.error.TweepError: [{'code': 34, 'message': 'Sorry, that page does not exist.'}]

@Hunterjet

This comment has been minimized.

Show comment
Hide comment
@Hunterjet

Hunterjet Aug 9, 2016

For some reason, calling user_timeline directly is much more inefficient than doing it with a cursor, like so:

                cursor = Cursor(self.client.user_timeline,
                                id=user_id,
                                count=200,
                                max_id=max_id).pages(MAX_TIMELINE_PAGES)
                for page in cursor:
                    logging.info('Obtained ' + str(i) + ' tweet pages for user ' + str(user_id) + '.')
                    i += 1
                    for tweet in page:
                        if not hasattr(tweet, 'retweeted_status'):
                            tweets.append(tweet)
                    max_id = page[-1].id - 1

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

Hunterjet commented Aug 9, 2016

For some reason, calling user_timeline directly is much more inefficient than doing it with a cursor, like so:

                cursor = Cursor(self.client.user_timeline,
                                id=user_id,
                                count=200,
                                max_id=max_id).pages(MAX_TIMELINE_PAGES)
                for page in cursor:
                    logging.info('Obtained ' + str(i) + ' tweet pages for user ' + str(user_id) + '.')
                    i += 1
                    for tweet in page:
                        if not hasattr(tweet, 'retweeted_status'):
                            tweets.append(tweet)
                    max_id = page[-1].id - 1

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

@santoshbs

This comment has been minimized.

Show comment
Hide comment
@santoshbs

santoshbs Aug 11, 2016

I am getting a syntax error at: print "getting tweets before %s" % (oldest)
Not sure what is wrong. Request your help.

I am getting a syntax error at: print "getting tweets before %s" % (oldest)
Not sure what is wrong. Request your help.

@Greenstan

This comment has been minimized.

Show comment
Hide comment
@Greenstan

Greenstan Aug 20, 2016

what if i wanted to save it into a database , would i then need to extract the data from the csv

what if i wanted to save it into a database , would i then need to extract the data from the csv

@adixxov

This comment has been minimized.

Show comment
Hide comment
@adixxov

adixxov Aug 22, 2016

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

with open(fname, 'r') as f: x = f.read() for status in x: code

Appreciate any help...

adixxov commented Aug 22, 2016

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

with open(fname, 'r') as f: x = f.read() for status in x: code

Appreciate any help...

@Troasi

This comment has been minimized.

Show comment
Hide comment
@Troasi

Troasi Aug 22, 2016

I get error in python 3x as the buffer does not support string. Help me to encode it.

Troasi commented Aug 22, 2016

I get error in python 3x as the buffer does not support string. Help me to encode it.

@dev-luis

This comment has been minimized.

Show comment
Hide comment
@dev-luis

dev-luis Sep 10, 2016

@santoshbs That's because the script was written for an older version of Python. The new syntax is: print(Your statements).

For the people that have problems running this script, I posted an alternate way to download the tweets using the new syntax on my website: http://luis-programming.com/blog/download_tweets/

I also added an example on how to analyze tweets that are not written using "Latin characters." If you're interested, you can also download the script on my website: http://luis-programming.com/blog/kanji_prj_twitter/

dev-luis commented Sep 10, 2016

@santoshbs That's because the script was written for an older version of Python. The new syntax is: print(Your statements).

For the people that have problems running this script, I posted an alternate way to download the tweets using the new syntax on my website: http://luis-programming.com/blog/download_tweets/

I also added an example on how to analyze tweets that are not written using "Latin characters." If you're interested, you can also download the script on my website: http://luis-programming.com/blog/kanji_prj_twitter/

@owlcatz

This comment has been minimized.

Show comment
Hide comment
@owlcatz

owlcatz Sep 24, 2016

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

owlcatz commented Sep 24, 2016

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

@cbaysan

This comment has been minimized.

Show comment
Hide comment
@cbaysan

cbaysan Sep 24, 2016

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

cbaysan commented Sep 24, 2016

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

@dhaikney

This comment has been minimized.

Show comment
Hide comment
@dhaikney

dhaikney Nov 2, 2016

@yanofsky Found this very useful, thank you!

dhaikney commented Nov 2, 2016

@yanofsky Found this very useful, thank you!

@abhijith0505

This comment has been minimized.

Show comment
Hide comment
@abhijith0505

abhijith0505 Nov 5, 2016

I found this article which says that request rate more than 2.5 times the access token rate is achieved. I haven't personally tested this.
Hope it is found useful.

http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./

I found this article which says that request rate more than 2.5 times the access token rate is achieved. I haven't personally tested this.
Hope it is found useful.

http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./

@ShupingZhang

This comment has been minimized.

Show comment
Hide comment
@ShupingZhang

ShupingZhang Nov 6, 2016

I run the code but it only downloaded 6 tweets (sometimes 206) instead of 3240. Does anyone know the reason? Thanks a lot!

get_all_tweets("City of Toronto")
getting tweets before 616320501871452159
...6 tweets downloaded so far

I'm using Python 2.7.12 Shell.

def get_all_tweets(screen_name):
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_namem = screen_name,count=200,max_id=oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass

ShupingZhang commented Nov 6, 2016

I run the code but it only downloaded 6 tweets (sometimes 206) instead of 3240. Does anyone know the reason? Thanks a lot!

get_all_tweets("City of Toronto")
getting tweets before 616320501871452159
...6 tweets downloaded so far

I'm using Python 2.7.12 Shell.

def get_all_tweets(screen_name):
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)
new_tweets = api.user_timeline(screen_namem = screen_name,count=200,max_id=oldest)
alltweets.extend(new_tweets)
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
with open('%s_tweets.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass

@brianhalperin

This comment has been minimized.

Show comment
Hide comment
@brianhalperin

brianhalperin Dec 6, 2016

Im trying to run the code however keep getting the following error:

Traceback (most recent call last):
File "/Users/Brian/Desktop/get_tweets.py", line 60, in
get_all_tweets("realDonaldTrump")
File "/Users/Brian/Desktop/get_tweets.py", line 52, in get_all_tweets
writer.writerow(["id","created_at","text"])
TypeError: a bytes-like object is required, not 'str'

Anyone know what this could be?

Im trying to run the code however keep getting the following error:

Traceback (most recent call last):
File "/Users/Brian/Desktop/get_tweets.py", line 60, in
get_all_tweets("realDonaldTrump")
File "/Users/Brian/Desktop/get_tweets.py", line 52, in get_all_tweets
writer.writerow(["id","created_at","text"])
TypeError: a bytes-like object is required, not 'str'

Anyone know what this could be?

@starkindustries

This comment has been minimized.

Show comment
Hide comment
@starkindustries

starkindustries Dec 27, 2016

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

@fatenaught

This comment has been minimized.

Show comment
Hide comment
@fatenaught

fatenaught Jan 11, 2017

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

@iLenTheme

This comment has been minimized.

Show comment
Hide comment
@iLenTheme

iLenTheme Jan 22, 2017

Hello, I get this error, what can it be? http://i.imgur.com/lDRA7uX.png

Hello, I get this error, what can it be? http://i.imgur.com/lDRA7uX.png

@Siddhant08

This comment has been minimized.

Show comment
Hide comment
@Siddhant08

Siddhant08 Jan 25, 2017

@yanofsky The code runs without errors but I can't seem to find the csv file. Where is it created?

Siddhant08 commented Jan 25, 2017

@yanofsky The code runs without errors but I can't seem to find the csv file. Where is it created?

@AadityaJ

This comment has been minimized.

Show comment
Hide comment
@AadityaJ

AadityaJ Jan 26, 2017

+1 Thanks for this script

+1 Thanks for this script

@gabrielsoule

This comment has been minimized.

Show comment
Hide comment
@crishernandezmaps

This comment has been minimized.

Show comment
Hide comment
@crishernandezmaps

crishernandezmaps Feb 7, 2017

Thanks for sharing!

Thanks for sharing!

@srijan-mishra

This comment has been minimized.

Show comment
Hide comment
@srijan-mishra

srijan-mishra Feb 17, 2017

Can we download more than 3240 tweets?

Can we download more than 3240 tweets?

@Deepak-

This comment has been minimized.

Show comment
Hide comment
@Deepak-

Deepak- Feb 27, 2017

Thanks for the script! I do wish there was a way to circumvent the 3240 limit.

Deepak- commented Feb 27, 2017

Thanks for the script! I do wish there was a way to circumvent the 3240 limit.

@buddiex

This comment has been minimized.

Show comment
Hide comment
@buddiex

buddiex Feb 27, 2017

@deepak same here... having that issue now ... trying to collect historical tweets for a data warehouse project.

buddiex commented Feb 27, 2017

@deepak same here... having that issue now ... trying to collect historical tweets for a data warehouse project.

@adam-fg

This comment has been minimized.

Show comment
Hide comment
@adam-fg

adam-fg Feb 28, 2017

Hi everyone,

I'm after some help - I'm trying to complete some research on Twitter Data for my MSc and this might work, but I have no idea how to use Python.

Would anybody be willing to run this code for me for 3 companies and if this works for hashtags, 3 more hashtags?

Fingers crossed!
Adam

adam-fg commented Feb 28, 2017

Hi everyone,

I'm after some help - I'm trying to complete some research on Twitter Data for my MSc and this might work, but I have no idea how to use Python.

Would anybody be willing to run this code for me for 3 companies and if this works for hashtags, 3 more hashtags?

Fingers crossed!
Adam

@davidneevel

This comment has been minimized.

Show comment
Hide comment
@davidneevel

davidneevel Mar 6, 2017

Thanks! I was just looking to grab a single tweet from one user's timeline and this was the best example of how to do that.

Thanks! I was just looking to grab a single tweet from one user's timeline and this was the best example of how to do that.

@xdslx

This comment has been minimized.

Show comment
Hide comment
@xdslx

xdslx Mar 29, 2017

is there a way to grab tweets in other languages which uses diferent language codes ? This code only get proper tweets in English. how to change the lang code ? , shortly.

xdslx commented Mar 29, 2017

is there a way to grab tweets in other languages which uses diferent language codes ? This code only get proper tweets in English. how to change the lang code ? , shortly.

@Faizah36

This comment has been minimized.

Show comment
Hide comment
@Faizah36

Faizah36 Apr 15, 2017

how to collect tweets in roman urdu language ? by using python as well as java i am able to get standard english tweets but i want to collect roman urdu tweets for sentiment analysis. please anyone

how to collect tweets in roman urdu language ? by using python as well as java i am able to get standard english tweets but i want to collect roman urdu tweets for sentiment analysis. please anyone

@lnvrl

This comment has been minimized.

Show comment
Hide comment
@lnvrl

lnvrl Apr 15, 2017

I am having the same problem that @iLemTheme in the line 36, says syntax error: invalid syntax

print "getting tweets before %s" % (oldest)

lnvrl commented Apr 15, 2017

I am having the same problem that @iLemTheme in the line 36, says syntax error: invalid syntax

print "getting tweets before %s" % (oldest)

@varpurantala

This comment has been minimized.

Show comment
Hide comment
@varpurantala

varpurantala Apr 17, 2017

Hello, can anyonoe help me out with getting tweets for multiple users? I tried: forming a list of users and pass it in the end like this: for item in list: get_all_tweets("list").

Hello, can anyonoe help me out with getting tweets for multiple users? I tried: forming a list of users and pass it in the end like this: for item in list: get_all_tweets("list").

@hasan-msh

This comment has been minimized.

Show comment
Hide comment
@hasan-msh

hasan-msh Apr 18, 2017

the tweets i need to download are non-English language , when i open the output file it shows funny stuff !!
any clues ?

thanks

the tweets i need to download are non-English language , when i open the output file it shows funny stuff !!
any clues ?

thanks

@carmonantonio

This comment has been minimized.

Show comment
Hide comment
@carmonantonio

carmonantonio Apr 23, 2017

@Invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functio
print (getting tweets before %s" % (oldest))

@Invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functio
print (getting tweets before %s" % (oldest))

@shivkumarkondi

This comment has been minimized.

Show comment
Hide comment
@shivkumarkondi

shivkumarkondi May 23, 2017

its giving most recent 3200 tweets . so what is the way to get older tweets than that? please post or let me know on my email : kumarkondi@gmail.com

its giving most recent 3200 tweets . so what is the way to get older tweets than that? please post or let me know on my email : kumarkondi@gmail.com

@jonhilgart22

This comment has been minimized.

Show comment
Hide comment
@jonhilgart22

jonhilgart22 Jun 5, 2017

Great code!

I edited it for Python 3.x. Also, I removed the URLs and the RTs from the user.

def get_all_tweets(screen_name):
"""Download the last 3240 tweets from a user. Do text processign to remove URLs and the retweets from a user.
Adapted from https://gist.github.com/yanofsky/5436496"""
#Twitter only allows access to a users most recent 3240 tweets with this method

#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(credentials['twitter']['consumer_key'], credentials['twitter']['consumer_secret'],)
auth.set_access_token(credentials['twitter']['token'], credentials['twitter']['token_secret'])
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []	

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
	print ("getting tweets before %s" % (oldest))
	
	#all subsiquent requests use the max_id param to prevent duplicates
	new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
	
	#save most recent tweets
	alltweets.extend(new_tweets)
	
	#update the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	
	print ("...%s tweets downloaded so far" % (len(alltweets)))
cleaned_text = [re.sub(r'http[s]?:\/\/.*[\W]*', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
cleaned_text = [re.sub(r'@[\w]*', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions 
cleaned_text = [re.sub(r'RT.*','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets
#transform the tweepy tweets into a 2D array that will populate the csv	
outtweets = [[tweet.id_str, tweet.created_at, cleaned_text[idx].encode("utf-8")] for idx,tweet in enumerate(alltweets)]

#write the csv	
with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
	writer = csv.writer(f)
	writer.writerow(["id","created_at","text"])
	writer.writerows(outtweets)

pass

Great code!

I edited it for Python 3.x. Also, I removed the URLs and the RTs from the user.

def get_all_tweets(screen_name):
"""Download the last 3240 tweets from a user. Do text processign to remove URLs and the retweets from a user.
Adapted from https://gist.github.com/yanofsky/5436496"""
#Twitter only allows access to a users most recent 3240 tweets with this method

#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(credentials['twitter']['consumer_key'], credentials['twitter']['consumer_secret'],)
auth.set_access_token(credentials['twitter']['token'], credentials['twitter']['token_secret'])
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []	

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
	print ("getting tweets before %s" % (oldest))
	
	#all subsiquent requests use the max_id param to prevent duplicates
	new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
	
	#save most recent tweets
	alltweets.extend(new_tweets)
	
	#update the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	
	print ("...%s tweets downloaded so far" % (len(alltweets)))
cleaned_text = [re.sub(r'http[s]?:\/\/.*[\W]*', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
cleaned_text = [re.sub(r'@[\w]*', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions 
cleaned_text = [re.sub(r'RT.*','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets
#transform the tweepy tweets into a 2D array that will populate the csv	
outtweets = [[tweet.id_str, tweet.created_at, cleaned_text[idx].encode("utf-8")] for idx,tweet in enumerate(alltweets)]

#write the csv	
with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
	writer = csv.writer(f)
	writer.writerow(["id","created_at","text"])
	writer.writerows(outtweets)

pass
@arturoaviles

This comment has been minimized.

Show comment
Hide comment
@arturoaviles

arturoaviles Jun 14, 2017

If I run this 16 times in less than 15 minutes, will the API stop answering? Thanks

If I run this 16 times in less than 15 minutes, will the API stop answering? Thanks

@rs2283

This comment has been minimized.

Show comment
Hide comment
@rs2283

rs2283 Jun 23, 2017

I need to extract tweets from twitter for a specific hashtag for last ten years.Can anyone please help me in providing the code in R for the same.

rs2283 commented Jun 23, 2017

I need to extract tweets from twitter for a specific hashtag for last ten years.Can anyone please help me in providing the code in R for the same.

@santiag080

This comment has been minimized.

Show comment
Hide comment
@santiag080

santiag080 Jun 28, 2017

i work with a similar code, with the code that i use i can input the username as i download the timeline directly without having to edit the code itself.... but the output format it's unreadable.... so, is there any way of making this code into a macro? like with an excel table put in a bunch of user and download every timeline???

santiag080 commented Jun 28, 2017

i work with a similar code, with the code that i use i can input the username as i download the timeline directly without having to edit the code itself.... but the output format it's unreadable.... so, is there any way of making this code into a macro? like with an excel table put in a bunch of user and download every timeline???

@santiag080

This comment has been minimized.

Show comment
Hide comment
@santiag080

santiag080 Jun 28, 2017

oh! this is the code i used before but doesn't work :/ as i said before the output format is unreadable... any ideas??

`import sys
import csv
import json
from datetime import datetime, date, timedelta
import time
import os
import twitter
import smtplib
import collections
from random import shuffle
from urllib2 import URLError
import signal
import atexit
import logging
import re
import argparse
import StringIO, traceback,string
import csv, codecs, cStringIO
from ConfigParser import SafeConfigParser

def t():
configParser = SafeConfigParser()
configFilePath = 'C:\config.txt'
configParser.read(configFilePath)
with codecs.open(configFilePath, 'r', encoding='utf-8') as f:
configParser.readfp(f)

CONSUMER_KEY = configParser.get('file', 'CONSUMER_KEY')
CONSUMER_SECRET = configParser.get('file', 'CONSUMER_SECRET')
APP_NAME = configParser.get('file', 'APP_NAME')

TOKEN_FILE = 'out/twitter.oauth'
try:
    (oauth_token, oauth_token_secret) = read_token_file(TOKEN_FILE)
except IOError, e:
    (oauth_token, oauth_token_secret) = oauth_dance(APP_NAME, CONSUMER_KEY,
            CONSUMER_SECRET)
    if not os.path.isdir('out'):
        os.mkdir('out')
    write_token_file(TOKEN_FILE, oauth_token, oauth_token_secret)
return twitter.Twitter(domain='api.twitter.com', api_version='1.1',
                    auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret,
                    CONSUMER_KEY, CONSUMER_SECRET))

def makeTwitterRequest(t, twitterFunction, max_errors=3, *args, **kwArgs):
wait_period = 2
error_count = 0
while True:
try:
return twitterFunction(*args, **kwArgs)
except twitter.api.TwitterHTTPError, e:
error_count = 0
wait_period = handleTwitterHTTPError(e, t, wait_period)
if wait_period is None:
return
except URLError, e:
error_count += 1
print >> sys.stderr, "URLError encountered. Continuing."
if error_count > max_errors:
print >> sys.stderr, "Too many consecutive errors...bailing out."
errorEmail ()
raise
def _getRemainingHits(t, resource_family):
remaining_hits = t.application.rate_limit_status()[u'resources'][u'search'][resource_family]
return remaining_hits
def handleTwitterHTTPError(e, t, wait_period=2):
if wait_period > 3600: # Seconds
print >> sys.stderr, 'Too many retries. Quitting.'
return None
wait_variable = int(datetime.now().strftime("%Y")[:2])
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
elif e.e.code in (404, 34):
print >> sys.stderr, 'Encountered 404 Error (pagina no encontrada)'
return None
elif e.e.code in (502, 503):
print >> sys.stderr, 'Encountered %i Error. Will retry in %i seconds' % (e.e.code,
wait_period)
time.sleep(wait_period)
wait_period *= 1.5
return wait_period
elif _getRemainingHits(t, u'/search/tweets')['remaining'] == 0:
status = _getRemainingHits(t, u'/search/tweets')['reset']
now = time.time()
rate_limit = status+wait_variable-now
sleep_time = max(900, rate_limit, 5) # Prevent negative numbers
print >> sys.stderr, 'Rate limit reached: sleeping for %i secs' % (rate_limit, )
time.sleep(sleep_time)
return 2
else:
raise e
def makeTwitterSearch (t, sts, salida,maximo):
cant_total = 0
#print "call inicial"
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, count=200)
#print response
if response is not None and len(response) > 0:
##lista tempral para almacenar los ids de la respuesta
temp_id_list = []
rta = response
for tweet in rta:
salida.write(str(tweet).replace('\r\n', '').replace('\n','').replace('\r','') + '\n')
temp_id_list.append(tweet['id'])
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print "cant = %s" % cantidad
cont = 1
while cantidad:
temp_id_list = []
print "Call %s " % (cont)
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, max_id = max_id, count=200)
rta = response
for tweet in rta:
salida.write(str(tweet) + '\n')
temp_id_list.append(tweet['id'])
if max_id == min(temp_id_list):
print "Finished! Thanks for searching with us today!"
break
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print cantidad * cont
print cantidad * cont
if maximo <> '':
if int(cantidad * cont)>= int(maximo):
break
print "cantidad encontrada = %s" % cantidad
cont += 1
print "Finalmente devolvemos %s tweets" % cant_total
return None
def normalize(archivo):
normalizations = {
'norm_search': collections.OrderedDict([
('Tweet ID',('xpath_get','id')),
('Tipo',('get_tweet_type', )),
('Retweet ID',('xpath_get','retweeted_status/id')),
('Retweet username',('xpath_get','retweeted_status/user/screen_name')),
('Retweet Count',('get_count','rts')), # sobre el rt si el tipo es RT
('Favorite Count',('get_count','favs')), # sobre el rt si el tipo es RT
('Text',('xpath_get','text')),
('Tweet_Lang',('xpath_get','lang')),
('Fecha',('format_date','created_at')),
('Source',('xpath_get','source')),
('User_username',('xpath_get','user/screen_name')),
('User_ID',('xpath_get','user/id')),
('User_tweet count',('xpath_get','user/statuses_count')),
('User_followers',('xpath_get','user/followers_count')),
('User_followings',('xpath_get','user/friends_count')),
('User_time zone',('xpath_get','user/time_zone')),
('User_language',('xpath_get','user/lang')),
('Location',('xpath_get','user/location')),
('User_create date',('format_date','user/created_at')),
('Mention1',('get_entities','mention',1)),
('Mention2',('get_entities','mention',2)),
('Mention3',('get_entities','mention',3)),
('Link1',('get_entities','link',1)),
('Link2',('get_entities','link',2)),
('Hashtag1',('get_entities','hashtag',1)),
('Hashtag2',('get_entities','hashtag',2)),
('Hashtag3',('get_entities','hashtag',3)),
('Fecha Timezone',('format_date','created_at',"%Y-%m-%d")),
('Dia Timezone',('format_date','created_at',"%a")),
('Hora Timezone',('format_date','created_at',"%H:00")),
('Corte Hora',('format_date','created_at',"%Y-%m-%d %H")),
('place_country',('xpath_get','place/country')),
('user_favourites_count',('xpath_get','user/favourites_count')),
('user_description',('xpath_get','user/description')),
('retweeted_status_user_favourites_count',('xpath_get', 'retweeted_status/user/favourites_count')),
('retweeted_status_user_listed_count',('xpath_get', 'retweeted_status/user/listed_count')),
('retweeted_status_user_profile_image_url',('xpath_get', 'retweeted_status/user/profile_image_url')),
('retweeted_status_created_at',('format_date','retweeted_status/created_at',"%Y-%m-%d %H")),
])
}
file = open(archivo,'r')
with open("/tmp/%s" %archivo+"_normalizado",'wb') as f_csv:
# write data
for row in file:
print row
row_2 = normalize_row(row, normalizations['norm_search'], None)
for e in row_2.iteritems():
print e
def normalize_row(row,format,timezone):
#pprint.pprint(row)

f = row_formatter(row, timezone)
f_rows = []
for (name, action) in format.iteritems():
    # call the appropiate method of row_formatter
    value = getattr(f, action[0])(*action[1:])
if (not value): value = ""
    if (type(value) != str and type(value) != unicode): 
        value = str(value)
    f_rows.append((name, value))
return collections.OrderedDict(f_rows)

class row_formatter:
def init(self, row, timezone):
self.row = row
self.timezone = timezone

def xpath_get(self, path):
    elem = self.row
    try:
        for x in path.strip("/").split("/"):
            elem = elem.get(x)
    except:
        pass

    return elem

def get_tweet_type(self): 
    if 'retweeted_status' in self.row and self.row['retweeted_status']:
        return "RT"
    #elif 'in_reply_to_user_id' in self.row and self.row['in_reply_to_user_id']: 
       # return "REPLY"
    else:
        return "TWEET" 

def get_count(self, count_type): 
    query = ''
    if self.get_tweet_type() == 'RT':
        query+= 'retweeted_status/'
    if (count_type == 'favs'): 
        query+= 'favorite_count'
    elif (count_type == 'rts'): 
        query+= 'retweet_count'
    else: 
        return None
    return self.xpath_get(query)

def get_text(self):
    if self.get_tweet_type() == 'RT':
        query+= ''

def format_date(self, query, output_format = "%Y-%m-%d %H:%M", timezone = None): 
    if (not timezone): timezone = self.timezone
    date = self.xpath_get(query)
    if (not date): return None
    utc = datetime.strptime(date, '%a %b %d %H:%M:%S +0000 %Y').replace(tzinfo=tz.gettz('UTC'))
    local =  utc.astimezone(tz.gettz(timezone))
    return local.strftime(output_format)

def get_entities(self, e_type, index): 
    matches = []
    if (e_type == 'link'):
        tmp = self.xpath_get('/entities/urls')
        if (tmp):
            matches = [e['expanded_url'] for e in tmp]
    if (e_type == 'mention'):
        tmp = self.xpath_get('/entities/user_mentions')
        if (tmp):
            matches = [e['screen_name'] for e in tmp]
    if (e_type == 'hashtag'):
        tmp = self.xpath_get('/entities/hashtags')
        if (tmp):
            matches = [e['text'] for e in tmp]
    
    if (len(matches) >= index):
        return matches[index - 1]
    
    return None

class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    # Redirect output to a queue
    self.queue = cStringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f
    self.encoder = codecs.getincrementalencoder(encoding)()

def writerow(self, row):
    self.writer.writerow([s.encode("utf-8").replace("\n"," ").replace("\r"," ").replace("\t",'') for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = self.encoder.encode(data)
    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

def writerows(self, rows):
    for row in rows:
        self.writerow(row)

if name == 'main':
t = t()
sts = raw_input("Ingrese usuario:")
maximo = raw_input("Ingrese el maximo de registros:")
ht = raw_input("Nombre de archivo?: ")
f = open(ht, 'w')
#sts = "from:%s OR @%s" % (sts,sts)
print "Buscando %s para %s." % (sts,ht)
makeTwitterSearch(t, sts, f,maximo)
f.close()
#normalize(ht)

`

oh! this is the code i used before but doesn't work :/ as i said before the output format is unreadable... any ideas??

`import sys
import csv
import json
from datetime import datetime, date, timedelta
import time
import os
import twitter
import smtplib
import collections
from random import shuffle
from urllib2 import URLError
import signal
import atexit
import logging
import re
import argparse
import StringIO, traceback,string
import csv, codecs, cStringIO
from ConfigParser import SafeConfigParser

def t():
configParser = SafeConfigParser()
configFilePath = 'C:\config.txt'
configParser.read(configFilePath)
with codecs.open(configFilePath, 'r', encoding='utf-8') as f:
configParser.readfp(f)

CONSUMER_KEY = configParser.get('file', 'CONSUMER_KEY')
CONSUMER_SECRET = configParser.get('file', 'CONSUMER_SECRET')
APP_NAME = configParser.get('file', 'APP_NAME')

TOKEN_FILE = 'out/twitter.oauth'
try:
    (oauth_token, oauth_token_secret) = read_token_file(TOKEN_FILE)
except IOError, e:
    (oauth_token, oauth_token_secret) = oauth_dance(APP_NAME, CONSUMER_KEY,
            CONSUMER_SECRET)
    if not os.path.isdir('out'):
        os.mkdir('out')
    write_token_file(TOKEN_FILE, oauth_token, oauth_token_secret)
return twitter.Twitter(domain='api.twitter.com', api_version='1.1',
                    auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret,
                    CONSUMER_KEY, CONSUMER_SECRET))

def makeTwitterRequest(t, twitterFunction, max_errors=3, *args, **kwArgs):
wait_period = 2
error_count = 0
while True:
try:
return twitterFunction(*args, **kwArgs)
except twitter.api.TwitterHTTPError, e:
error_count = 0
wait_period = handleTwitterHTTPError(e, t, wait_period)
if wait_period is None:
return
except URLError, e:
error_count += 1
print >> sys.stderr, "URLError encountered. Continuing."
if error_count > max_errors:
print >> sys.stderr, "Too many consecutive errors...bailing out."
errorEmail ()
raise
def _getRemainingHits(t, resource_family):
remaining_hits = t.application.rate_limit_status()[u'resources'][u'search'][resource_family]
return remaining_hits
def handleTwitterHTTPError(e, t, wait_period=2):
if wait_period > 3600: # Seconds
print >> sys.stderr, 'Too many retries. Quitting.'
return None
wait_variable = int(datetime.now().strftime("%Y")[:2])
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
if e.e.code == 401:
print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
return None
elif e.e.code in (404, 34):
print >> sys.stderr, 'Encountered 404 Error (pagina no encontrada)'
return None
elif e.e.code in (502, 503):
print >> sys.stderr, 'Encountered %i Error. Will retry in %i seconds' % (e.e.code,
wait_period)
time.sleep(wait_period)
wait_period *= 1.5
return wait_period
elif _getRemainingHits(t, u'/search/tweets')['remaining'] == 0:
status = _getRemainingHits(t, u'/search/tweets')['reset']
now = time.time()
rate_limit = status+wait_variable-now
sleep_time = max(900, rate_limit, 5) # Prevent negative numbers
print >> sys.stderr, 'Rate limit reached: sleeping for %i secs' % (rate_limit, )
time.sleep(sleep_time)
return 2
else:
raise e
def makeTwitterSearch (t, sts, salida,maximo):
cant_total = 0
#print "call inicial"
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, count=200)
#print response
if response is not None and len(response) > 0:
##lista tempral para almacenar los ids de la respuesta
temp_id_list = []
rta = response
for tweet in rta:
salida.write(str(tweet).replace('\r\n', '').replace('\n','').replace('\r','') + '\n')
temp_id_list.append(tweet['id'])
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print "cant = %s" % cantidad
cont = 1
while cantidad:
temp_id_list = []
print "Call %s " % (cont)
response = makeTwitterRequest(t, t.statuses.user_timeline, screen_name = sts, max_id = max_id, count=200)
rta = response
for tweet in rta:
salida.write(str(tweet) + '\n')
temp_id_list.append(tweet['id'])
if max_id == min(temp_id_list):
print "Finished! Thanks for searching with us today!"
break
max_id = min(temp_id_list)
cantidad = len(response)
cant_total+= cantidad
#print cantidad * cont
print cantidad * cont
if maximo <> '':
if int(cantidad * cont)>= int(maximo):
break
print "cantidad encontrada = %s" % cantidad
cont += 1
print "Finalmente devolvemos %s tweets" % cant_total
return None
def normalize(archivo):
normalizations = {
'norm_search': collections.OrderedDict([
('Tweet ID',('xpath_get','id')),
('Tipo',('get_tweet_type', )),
('Retweet ID',('xpath_get','retweeted_status/id')),
('Retweet username',('xpath_get','retweeted_status/user/screen_name')),
('Retweet Count',('get_count','rts')), # sobre el rt si el tipo es RT
('Favorite Count',('get_count','favs')), # sobre el rt si el tipo es RT
('Text',('xpath_get','text')),
('Tweet_Lang',('xpath_get','lang')),
('Fecha',('format_date','created_at')),
('Source',('xpath_get','source')),
('User_username',('xpath_get','user/screen_name')),
('User_ID',('xpath_get','user/id')),
('User_tweet count',('xpath_get','user/statuses_count')),
('User_followers',('xpath_get','user/followers_count')),
('User_followings',('xpath_get','user/friends_count')),
('User_time zone',('xpath_get','user/time_zone')),
('User_language',('xpath_get','user/lang')),
('Location',('xpath_get','user/location')),
('User_create date',('format_date','user/created_at')),
('Mention1',('get_entities','mention',1)),
('Mention2',('get_entities','mention',2)),
('Mention3',('get_entities','mention',3)),
('Link1',('get_entities','link',1)),
('Link2',('get_entities','link',2)),
('Hashtag1',('get_entities','hashtag',1)),
('Hashtag2',('get_entities','hashtag',2)),
('Hashtag3',('get_entities','hashtag',3)),
('Fecha Timezone',('format_date','created_at',"%Y-%m-%d")),
('Dia Timezone',('format_date','created_at',"%a")),
('Hora Timezone',('format_date','created_at',"%H:00")),
('Corte Hora',('format_date','created_at',"%Y-%m-%d %H")),
('place_country',('xpath_get','place/country')),
('user_favourites_count',('xpath_get','user/favourites_count')),
('user_description',('xpath_get','user/description')),
('retweeted_status_user_favourites_count',('xpath_get', 'retweeted_status/user/favourites_count')),
('retweeted_status_user_listed_count',('xpath_get', 'retweeted_status/user/listed_count')),
('retweeted_status_user_profile_image_url',('xpath_get', 'retweeted_status/user/profile_image_url')),
('retweeted_status_created_at',('format_date','retweeted_status/created_at',"%Y-%m-%d %H")),
])
}
file = open(archivo,'r')
with open("/tmp/%s" %archivo+"_normalizado",'wb') as f_csv:
# write data
for row in file:
print row
row_2 = normalize_row(row, normalizations['norm_search'], None)
for e in row_2.iteritems():
print e
def normalize_row(row,format,timezone):
#pprint.pprint(row)

f = row_formatter(row, timezone)
f_rows = []
for (name, action) in format.iteritems():
    # call the appropiate method of row_formatter
    value = getattr(f, action[0])(*action[1:])
if (not value): value = ""
    if (type(value) != str and type(value) != unicode): 
        value = str(value)
    f_rows.append((name, value))
return collections.OrderedDict(f_rows)

class row_formatter:
def init(self, row, timezone):
self.row = row
self.timezone = timezone

def xpath_get(self, path):
    elem = self.row
    try:
        for x in path.strip("/").split("/"):
            elem = elem.get(x)
    except:
        pass

    return elem

def get_tweet_type(self): 
    if 'retweeted_status' in self.row and self.row['retweeted_status']:
        return "RT"
    #elif 'in_reply_to_user_id' in self.row and self.row['in_reply_to_user_id']: 
       # return "REPLY"
    else:
        return "TWEET" 

def get_count(self, count_type): 
    query = ''
    if self.get_tweet_type() == 'RT':
        query+= 'retweeted_status/'
    if (count_type == 'favs'): 
        query+= 'favorite_count'
    elif (count_type == 'rts'): 
        query+= 'retweet_count'
    else: 
        return None
    return self.xpath_get(query)

def get_text(self):
    if self.get_tweet_type() == 'RT':
        query+= ''

def format_date(self, query, output_format = "%Y-%m-%d %H:%M", timezone = None): 
    if (not timezone): timezone = self.timezone
    date = self.xpath_get(query)
    if (not date): return None
    utc = datetime.strptime(date, '%a %b %d %H:%M:%S +0000 %Y').replace(tzinfo=tz.gettz('UTC'))
    local =  utc.astimezone(tz.gettz(timezone))
    return local.strftime(output_format)

def get_entities(self, e_type, index): 
    matches = []
    if (e_type == 'link'):
        tmp = self.xpath_get('/entities/urls')
        if (tmp):
            matches = [e['expanded_url'] for e in tmp]
    if (e_type == 'mention'):
        tmp = self.xpath_get('/entities/user_mentions')
        if (tmp):
            matches = [e['screen_name'] for e in tmp]
    if (e_type == 'hashtag'):
        tmp = self.xpath_get('/entities/hashtags')
        if (tmp):
            matches = [e['text'] for e in tmp]
    
    if (len(matches) >= index):
        return matches[index - 1]
    
    return None

class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    # Redirect output to a queue
    self.queue = cStringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f
    self.encoder = codecs.getincrementalencoder(encoding)()

def writerow(self, row):
    self.writer.writerow([s.encode("utf-8").replace("\n"," ").replace("\r"," ").replace("\t",'') for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = self.encoder.encode(data)
    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

def writerows(self, rows):
    for row in rows:
        self.writerow(row)

if name == 'main':
t = t()
sts = raw_input("Ingrese usuario:")
maximo = raw_input("Ingrese el maximo de registros:")
ht = raw_input("Nombre de archivo?: ")
f = open(ht, 'w')
#sts = "from:%s OR @%s" % (sts,sts)
print "Buscando %s para %s." % (sts,ht)
makeTwitterSearch(t, sts, f,maximo)
f.close()
#normalize(ht)

`

@santiag080

This comment has been minimized.

Show comment
Hide comment

@jdkram @jdkram!!!! HOW??

@colbybair

This comment has been minimized.

Show comment
Hide comment
@colbybair

colbybair Jun 29, 2017

I'm sure I'm doing something obviously wrong, but I'm getting this error when I try to run the code:

Traceback (most recent call last):
File "tweet_dumper.py", line 64, in
get_all_tweets("J_tsar")
File "tweet_dumper.py", line 18, in get_all_tweets
from tweepy.auth import OAuthHandler
ImportError: No module named auth

Any thoughts on this?

I'm sure I'm doing something obviously wrong, but I'm getting this error when I try to run the code:

Traceback (most recent call last):
File "tweet_dumper.py", line 64, in
get_all_tweets("J_tsar")
File "tweet_dumper.py", line 18, in get_all_tweets
from tweepy.auth import OAuthHandler
ImportError: No module named auth

Any thoughts on this?

@santiag080

This comment has been minimized.

Show comment
Hide comment
@santiag080

santiag080 Jun 29, 2017

@colbybair

did you put the Twitter API's keys?

@colbybair

did you put the Twitter API's keys?

@jasserkh

This comment has been minimized.

Show comment
Hide comment
@jasserkh

jasserkh Jul 4, 2017

i want to extract tweets for a specific period of time, any one have an idea?? thanks

jasserkh commented Jul 4, 2017

i want to extract tweets for a specific period of time, any one have an idea?? thanks

@dev-luis

This comment has been minimized.

Show comment
Hide comment
@dev-luis

dev-luis Aug 3, 2017

@jasserkh You can do it like this:

import time
import tweepy
from datetime import datetime, date

#get current date
currentDate = time.strftime("%x")

year = currentDate [6:8]
month = currentDate [0:2]
day = currentDate [3:5]

#reformat the date values
current_dateStr = "20" + year + "-" + month + "-" + day

#convert string to date
currentDate = datetime.strptime(current_dateStr, "%Y-%m-%d").date()
...
...

for tweet in allTweetsList:
try:
#make sure the tweet is recent
createdAt_str = str(tweet.created_at)
ind = createdAt_str.find(" ")
new_createdAt = createdAt_str[:ind]

    #convert string to date
    createdAt = datetime.strptime(new_createdAt, "%Y-%m-%d").date()

    #compare the dates
    if createdAt == currentDate:
        #do something

except tweepy.TweepError as e:
    print(e.response)

If you have questions, please reply to me: http://luis-programming.com/blog/download_tweets/
It's hard to track the replies here.

dev-luis commented Aug 3, 2017

@jasserkh You can do it like this:

import time
import tweepy
from datetime import datetime, date

#get current date
currentDate = time.strftime("%x")

year = currentDate [6:8]
month = currentDate [0:2]
day = currentDate [3:5]

#reformat the date values
current_dateStr = "20" + year + "-" + month + "-" + day

#convert string to date
currentDate = datetime.strptime(current_dateStr, "%Y-%m-%d").date()
...
...

for tweet in allTweetsList:
try:
#make sure the tweet is recent
createdAt_str = str(tweet.created_at)
ind = createdAt_str.find(" ")
new_createdAt = createdAt_str[:ind]

    #convert string to date
    createdAt = datetime.strptime(new_createdAt, "%Y-%m-%d").date()

    #compare the dates
    if createdAt == currentDate:
        #do something

except tweepy.TweepError as e:
    print(e.response)

If you have questions, please reply to me: http://luis-programming.com/blog/download_tweets/
It's hard to track the replies here.

@tianke0711

This comment has been minimized.

Show comment
Hide comment
@tianke0711

tianke0711 Sep 20, 2017

Hi thanks for you code. when I used you code to collect data using python 3
why do the tweet text include characters like: "b" and \xe2\x80\x99s

"b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s ROH Broadcast https://t.co/uIV7TKHs9K'"

Actually in the original tweet is(https://twitter.com/sheezy0): Adam Cole Praises Kevin Owens + A Preview For Next Week’s ROH Broadcast

\xe2\x80\x99s is represent ''s'. I don't know how to solve this issue, I mean I want to get the ''s' in the text. Thanks!

Hi thanks for you code. when I used you code to collect data using python 3
why do the tweet text include characters like: "b" and \xe2\x80\x99s

"b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s ROH Broadcast https://t.co/uIV7TKHs9K'"

Actually in the original tweet is(https://twitter.com/sheezy0): Adam Cole Praises Kevin Owens + A Preview For Next Week’s ROH Broadcast

\xe2\x80\x99s is represent ''s'. I don't know how to solve this issue, I mean I want to get the ''s' in the text. Thanks!

@states-of-fragility

This comment has been minimized.

Show comment
Hide comment
@states-of-fragility

states-of-fragility Sep 29, 2017

Hi! The code works just fine, thanks for sharing.
Yet, I would like to extend the code to retrieve non-english tweets as with this method the arabic letters are translated into funny combinations of roman letters and numbers. I have seen other people asking the same question but so far no answer. Maybe this time it attracts more attention.
Has someone found a solution? I'm a bit desperate.
Merci bien!

Edit: I posted the answer in stack overflow and was able to overcome this issue. In case someone else got stuck with this: https://stackoverflow.com/questions/46510879/saving-arabic-tweets-from-tweepy-in-cvs/46523781?noredirect=1#comment80010395_46523781

states-of-fragility commented Sep 29, 2017

Hi! The code works just fine, thanks for sharing.
Yet, I would like to extend the code to retrieve non-english tweets as with this method the arabic letters are translated into funny combinations of roman letters and numbers. I have seen other people asking the same question but so far no answer. Maybe this time it attracts more attention.
Has someone found a solution? I'm a bit desperate.
Merci bien!

Edit: I posted the answer in stack overflow and was able to overcome this issue. In case someone else got stuck with this: https://stackoverflow.com/questions/46510879/saving-arabic-tweets-from-tweepy-in-cvs/46523781?noredirect=1#comment80010395_46523781

@hub2git

This comment has been minimized.

Show comment
Hide comment
@hub2git

hub2git Oct 18, 2017

Hi all. Is there a similar script for downloading all of a CuriousCat.me user's Q&As? For example, https://curiouscat.me/curiouscat.

hub2git commented Oct 18, 2017

Hi all. Is there a similar script for downloading all of a CuriousCat.me user's Q&As? For example, https://curiouscat.me/curiouscat.

@pavankthatha

This comment has been minimized.

Show comment
Hide comment
@pavankthatha

pavankthatha Oct 26, 2017

Posted code works for a given handle, m trying to introduce filters for the tweets, any help would be appreciated.

Posted code works for a given handle, m trying to introduce filters for the tweets, any help would be appreciated.

@sanju9522

This comment has been minimized.

Show comment
Hide comment
@sanju9522

sanju9522 Nov 29, 2017

Hi,
I am new to python. Please don't hesitate if my question is very basic.
Is there is a way to run this code for multiple usernames and generate csv files for each username like macros in excel.

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("username1" "username2" "username3" "username4")

Please anyone suggest.
Thanks in advance

Hi,
I am new to python. Please don't hesitate if my question is very basic.
Is there is a way to run this code for multiple usernames and generate csv files for each username like macros in excel.

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("username1" "username2" "username3" "username4")

Please anyone suggest.
Thanks in advance

@nonamethanks

This comment has been minimized.

Show comment
Hide comment
@nonamethanks

nonamethanks Dec 8, 2017

@sanju9522:

if name == 'main':
    usernames = ["yourname1", "yourname2"]
    for x in usernames:
        get_all_tweets(x)

You can even use something like usernames.append() combined with raw_input to add usernames at will on input when you launch the script via terminal.

nonamethanks commented Dec 8, 2017

@sanju9522:

if name == 'main':
    usernames = ["yourname1", "yourname2"]
    for x in usernames:
        get_all_tweets(x)

You can even use something like usernames.append() combined with raw_input to add usernames at will on input when you launch the script via terminal.

@bsteen

This comment has been minimized.

Show comment
Hide comment
@bsteen

bsteen Dec 13, 2017

Thanks for this. I used the basic framework of your code in my project: https://github.com/bsteen/markov_tweet_generator
I cited you as a source in the "Resources Used" in my README.

bsteen commented Dec 13, 2017

Thanks for this. I used the basic framework of your code in my project: https://github.com/bsteen/markov_tweet_generator
I cited you as a source in the "Resources Used" in my README.

@csik

This comment has been minimized.

Show comment
Hide comment
@csik

csik Dec 20, 2017

For those looking to download more than just the last 3k-some tweets I found this useful:
https://github.com/bpb27/twitter_scraping

It uses two steps, first Selenium, essentially taking over a browser to get as many tweet IDs as possible by going to each page day by day. I believe this should be possible as well with the API approach above. The second step uses Tweepy to make requests the IDs for metadata.

csik commented Dec 20, 2017

For those looking to download more than just the last 3k-some tweets I found this useful:
https://github.com/bpb27/twitter_scraping

It uses two steps, first Selenium, essentially taking over a browser to get as many tweet IDs as possible by going to each page day by day. I believe this should be possible as well with the API approach above. The second step uses Tweepy to make requests the IDs for metadata.

@pjrudloff

This comment has been minimized.

Show comment
Hide comment
@pjrudloff

pjrudloff Dec 22, 2017

@yanofsky Thank you for your work. What license applies to the code?

@yanofsky Thank you for your work. What license applies to the code?

@hammadawan50

This comment has been minimized.

Show comment
Hide comment
@hammadawan50

hammadawan50 Dec 30, 2017

i am getting following error.
TweepError: Failed to parse JSON payload: Unterminated string starting at: line 1 column 507204 (char 507203)

i am getting following error.
TweepError: Failed to parse JSON payload: Unterminated string starting at: line 1 column 507204 (char 507203)

@4emkay

This comment has been minimized.

Show comment
Hide comment
@4emkay

4emkay Jan 8, 2018

Thank you...Worked Great

4emkay commented Jan 8, 2018

Thank you...Worked Great

@atoponce

This comment has been minimized.

Show comment
Hide comment
@atoponce

atoponce Jan 21, 2018

To support the full text of 280 characters, apply the following patch:

--- /tmp/tweet_dumper.py	2018-01-21 06:07:26.646774539 -0700
+++ tweet_dumper.py	2018-01-21 06:07:20.454724904 -0700
 def get_all_tweets(screen_name):
@@ -23,7 +23,7 @@
 	alltweets = []	
 	
 	#make initial request for most recent tweets (200 is the maximum allowed count)
-	new_tweets = api.user_timeline(screen_name = screen_name,count=200)
+	new_tweets = api.user_timeline(screen_name = screen_name, count=200, tweet_mode='extended')
 	
 	#save most recent tweets
 	alltweets.extend(new_tweets)
@@ -36,7 +36,7 @@
 		print "getting tweets before %s" % (oldest)
 		
 		#all subsiquent requests use the max_id param to prevent duplicates
-		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
+		new_tweets = api.user_timeline(screen_name = screen_name,count=200, tweet_mode='extended', max_id=oldest)
 		
 		#save most recent tweets
 		alltweets.extend(new_tweets)
@@ -47,7 +47,7 @@
 		print "...%s tweets downloaded so far" % (len(alltweets))
 	
 	#transform the tweepy tweets into a 2D array that will populate the csv	
-	outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
+	outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
 	
 	#write the csv	
 	with open('%s_tweets.csv' % screen_name, 'wb') as f:
@@ -60,4 +60,4 @@
 
 if __name__ == '__main__':
 	#pass in the username of the account you want to download
-	get_all_tweets("J_tsar")
+	get_all_tweets("realDonaldTrump")

atoponce commented Jan 21, 2018

To support the full text of 280 characters, apply the following patch:

--- /tmp/tweet_dumper.py	2018-01-21 06:07:26.646774539 -0700
+++ tweet_dumper.py	2018-01-21 06:07:20.454724904 -0700
 def get_all_tweets(screen_name):
@@ -23,7 +23,7 @@
 	alltweets = []	
 	
 	#make initial request for most recent tweets (200 is the maximum allowed count)
-	new_tweets = api.user_timeline(screen_name = screen_name,count=200)
+	new_tweets = api.user_timeline(screen_name = screen_name, count=200, tweet_mode='extended')
 	
 	#save most recent tweets
 	alltweets.extend(new_tweets)
@@ -36,7 +36,7 @@
 		print "getting tweets before %s" % (oldest)
 		
 		#all subsiquent requests use the max_id param to prevent duplicates
-		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
+		new_tweets = api.user_timeline(screen_name = screen_name,count=200, tweet_mode='extended', max_id=oldest)
 		
 		#save most recent tweets
 		alltweets.extend(new_tweets)
@@ -47,7 +47,7 @@
 		print "...%s tweets downloaded so far" % (len(alltweets))
 	
 	#transform the tweepy tweets into a 2D array that will populate the csv	
-	outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
+	outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
 	
 	#write the csv	
 	with open('%s_tweets.csv' % screen_name, 'wb') as f:
@@ -60,4 +60,4 @@
 
 if __name__ == '__main__':
 	#pass in the username of the account you want to download
-	get_all_tweets("J_tsar")
+	get_all_tweets("realDonaldTrump")
@AttributeErrorCat

This comment has been minimized.

Show comment
Hide comment
@AttributeErrorCat

AttributeErrorCat Feb 16, 2018

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

import tweepy
import csv

consumer_key = ''
consumer_secret = "'
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth.secure = True
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=340, include_rts=False)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)

#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=340,max_id=oldest,tweet_mode = 'extended')

#save most recent tweets
alltweets.extend(new_tweets)

#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

print "...%s tweets downloaded so far" % (len(alltweets))

#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]

#write the csv
with open('%s_nyttweet2.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["tweetid","date","text"])
writer.writerows(outtweets)
from urllib import urlopen

pass

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("nytimes")

Hi guys - I'm new to python and I'm trying to stop my tweets from being truncated. I added an extended tweet mode but I get this error.

" AttributeError: 'Status' object has no attribute 'text'"
I can't find where I should change text to full.text.

Also, does anyone know the code to remove the URL from the output too? My tweets look like this:

"The Trump administration released a list of 210 people who were identified because of their "closeness to the Russi… https://t.co/5NmKPtQNrO"
THANK YOU!!!

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

import tweepy
import csv

consumer_key = ''
consumer_secret = "'
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth.secure = True
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=340, include_rts=False)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print "getting tweets before %s" % (oldest)

#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=340,max_id=oldest,tweet_mode = 'extended')

#save most recent tweets
alltweets.extend(new_tweets)

#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

print "...%s tweets downloaded so far" % (len(alltweets))

#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]

#write the csv
with open('%s_nyttweet2.csv' % screen_name, 'wb') as f:
writer = csv.writer(f)
writer.writerow(["tweetid","date","text"])
writer.writerows(outtweets)
from urllib import urlopen

pass

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("nytimes")

Hi guys - I'm new to python and I'm trying to stop my tweets from being truncated. I added an extended tweet mode but I get this error.

" AttributeError: 'Status' object has no attribute 'text'"
I can't find where I should change text to full.text.

Also, does anyone know the code to remove the URL from the output too? My tweets look like this:

"The Trump administration released a list of 210 people who were identified because of their "closeness to the Russi… https://t.co/5NmKPtQNrO"
THANK YOU!!!

@kumargouravdas

This comment has been minimized.

Show comment
Hide comment
@kumargouravdas

kumargouravdas Feb 20, 2018

After successfully running the code .I face one problem that is for long tweet i get only a portion of tweet.For eample
One instance of what i get
"Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the r… https://t.co/qG85rbCgIh"
And what actually the tweet is
Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
That means from my output this portion is missing
rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
Have anyone face this problem?
suggest some solution.

After successfully running the code .I face one problem that is for long tweet i get only a portion of tweet.For eample
One instance of what i get
"Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the r… https://t.co/qG85rbCgIh"
And what actually the tweet is
Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
That means from my output this portion is missing
rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
Have anyone face this problem?
suggest some solution.

@yang-qian

This comment has been minimized.

Show comment
Hide comment
@yang-qian

yang-qian Feb 25, 2018

@kumargouravdas
I had the same problem. You can fix it by adding tweet_mode="extended" when calling the user_timeline func. Correspondingly, change tweet.text to tweet.full_text.
references:
https://github.com/sferik/twitter/issues/880
sferik/twitter#880

yang-qian commented Feb 25, 2018

@kumargouravdas
I had the same problem. You can fix it by adding tweet_mode="extended" when calling the user_timeline func. Correspondingly, change tweet.text to tweet.full_text.
references:
https://github.com/sferik/twitter/issues/880
sferik/twitter#880

@SadiaNaseemKhan

This comment has been minimized.

Show comment
Hide comment
@SadiaNaseemKhan

SadiaNaseemKhan Mar 13, 2018

I keep getting this error
TweepError: [{'code': 215, 'message': 'Bad Authentication data.'}]
How can i solve this?

I keep getting this error
TweepError: [{'code': 215, 'message': 'Bad Authentication data.'}]
How can i solve this?

@nikitamaloo

This comment has been minimized.

Show comment
Hide comment
@nikitamaloo

nikitamaloo Apr 8, 2018

@atoponce

I am new to python. I used your code to get full tweets of 280 characters. But its showing this error.

Traceback (most recent call last):
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 61, in
get_all_tweets("realDonaldTrump")
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in get_all_tweets
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
TypeError: a bytes-like object is required, not 'str'

nikitamaloo commented Apr 8, 2018

@atoponce

I am new to python. I used your code to get full tweets of 280 characters. But its showing this error.

Traceback (most recent call last):
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 61, in
get_all_tweets("realDonaldTrump")
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in get_all_tweets
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
File "C:\Users\Nikita\Desktop\LORETTO\Spring Project\tweet_dumper.py", line 48, in
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8").replace('\n', ' ').replace('\r', '')] for tweet in alltweets]
TypeError: a bytes-like object is required, not 'str'

@nikitamaloo

This comment has been minimized.

Show comment
Hide comment
@nikitamaloo

nikitamaloo Apr 8, 2018

@yanofsky
Thank you so much!! This code is really useful.
I am using python for the first time. I need to extract tweets for my social media analytics project.

Do you have the code to get more information about each tweet like "how many likes the tweet got" and "how" many times it was retweeted

It would be really helpful I could get that extended code to get to the next step in my project.

@yanofsky
Thank you so much!! This code is really useful.
I am using python for the first time. I need to extract tweets for my social media analytics project.

Do you have the code to get more information about each tweet like "how many likes the tweet got" and "how" many times it was retweeted

It would be really helpful I could get that extended code to get to the next step in my project.

@kamalikap

This comment has been minimized.

Show comment
Hide comment
@kamalikap

kamalikap May 23, 2018

@nikitamaloo

Don't add replace at the end of line, instead change the last line to ----
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8")] for tweet in alltweets]

This works for me.

kamalikap commented May 23, 2018

@nikitamaloo

Don't add replace at the end of line, instead change the last line to ----
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode("utf-8")] for tweet in alltweets]

This works for me.

@kamalikap

This comment has been minimized.

Show comment
Hide comment
@kamalikap

kamalikap May 28, 2018

@yanofsky Thanks for the code. Do you know how can I give the input as a streaming data and not by a particular user and let it track for a period of time?

@yanofsky Thanks for the code. Do you know how can I give the input as a streaming data and not by a particular user and let it track for a period of time?

@kamalikap

This comment has been minimized.

Show comment
Hide comment
@kamalikap

kamalikap May 28, 2018

Thanks to @yanofsky and @freimanas for helping with the code.

Here is my code which is modified and contains:

  • full text
    -images
    -hashtags

I hope this can be of some help.

kamalikap commented May 28, 2018

Thanks to @yanofsky and @freimanas for helping with the code.

Here is my code which is modified and contains:

  • full text
    -images
    -hashtags

I hope this can be of some help.

@ausok

This comment has been minimized.

Show comment
Hide comment
@ausok

ausok Jun 5, 2018

Hey, thanks a lot for the script. It works very well. Just one question: What would I have to do if I wanted to run the script, let's say daily, but only wanted to get the newest tweets I have not saved already? Thanks for any help.

ausok commented Jun 5, 2018

Hey, thanks a lot for the script. It works very well. Just one question: What would I have to do if I wanted to run the script, let's say daily, but only wanted to get the newest tweets I have not saved already? Thanks for any help.

@sunalit

This comment has been minimized.

Show comment
Hide comment
@sunalit

sunalit Jun 11, 2018

Thanks a lot for the script. It works very well. But how can I include other elements like 'sizeof', author', 'contributors', 'coordinates', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'retweet', 'retweet_count', 'retweeted', 'retweets', 'source'. I need this data as well for the analysis. Thanks for help.

sunalit commented Jun 11, 2018

Thanks a lot for the script. It works very well. But how can I include other elements like 'sizeof', author', 'contributors', 'coordinates', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'retweet', 'retweet_count', 'retweeted', 'retweets', 'source'. I need this data as well for the analysis. Thanks for help.

@m-ueberall

This comment has been minimized.

Show comment
Hide comment
@m-ueberall

m-ueberall Jul 4, 2018

FYI: Using tweepy 3.6.0, I saw that retweets are not yet retrieved in complete form (i.e., up to 280 characters) even after applying the patch by @atoponce. Seems to be a library problem, though.

FYI: Using tweepy 3.6.0, I saw that retweets are not yet retrieved in complete form (i.e., up to 280 characters) even after applying the patch by @atoponce. Seems to be a library problem, though.

@coolmechel

This comment has been minimized.

Show comment
Hide comment
@coolmechel

coolmechel Jul 5, 2018

@jonhilgart22 Interesting code. am new here and will love to explore, am using Jupiter notebook on my local machine
I ran into the following error, Will very much appreciate aheads-up

getting tweets before 276674512769150975
...98 tweets downloaded so far

NameError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("coolmechel")

in get_all_tweets(screen_name)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

in (.0)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

NameError: name 're' is not defined

@jonhilgart22 Interesting code. am new here and will love to explore, am using Jupiter notebook on my local machine
I ran into the following error, Will very much appreciate aheads-up

getting tweets before 276674512769150975
...98 tweets downloaded so far

NameError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("coolmechel")

in get_all_tweets(screen_name)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

in (.0)
39
40 print ("...%s tweets downloaded so far" % (len(alltweets)))
---> 41 cleaned_text = [re.sub(r'http[s]?://.[\W]', '', i.text, flags=re.MULTILINE) for i in alltweets] # remove urls
42 cleaned_text = [re.sub(r'@[\w]', '', i, flags=re.MULTILINE) for i in cleaned_text] # remove the @twitter mentions
43 cleaned_text = [re.sub(r'RT.
','', i, flags=re.MULTILINE) for i in cleaned_text] # delete the retweets

NameError: name 're' is not defined

@m-ueberall

This comment has been minimized.

Show comment
Hide comment
@m-ueberall

m-ueberall Jul 6, 2018

@coolmechel: You're using regular expression operations without having imported the required module (i.e., "import re").
Have a look at https://docs.python.org/2/library/re.html or https://docs.python.org/3/library/re.html

@coolmechel: You're using regular expression operations without having imported the required module (i.e., "import re").
Have a look at https://docs.python.org/2/library/re.html or https://docs.python.org/3/library/re.html

@nicknazari

This comment has been minimized.

Show comment
Hide comment
@nicknazari

nicknazari Jul 6, 2018

Hello, when I run the code I get a permissions error: PermissionError: [Errno 13] Permission denied: 'realDonaldTrump_tweets.csv'

I tried the many StackOverflow solutions to this issue and I am still unable to write to any files. Is anyone aware of a possible fix?

Hello, when I run the code I get a permissions error: PermissionError: [Errno 13] Permission denied: 'realDonaldTrump_tweets.csv'

I tried the many StackOverflow solutions to this issue and I am still unable to write to any files. Is anyone aware of a possible fix?

@coolmechel

This comment has been minimized.

Show comment
Hide comment
@coolmechel

coolmechel Jul 8, 2018

@m-ueberall.
Thank you very much for responding to my questions, I totally forgot to import that module. However, I ran into another problem.

FileNotFoundError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("BigDataGal")

in get_all_tweets(screen_name)
46
47 #write the csv
---> 48 with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
49 writer = csv.writer(f)
50 writer.writerow(["id","created_at","text"])

FileNotFoundError: [Errno 2] No such file or directory: '../data/raw/svb_founders/BigDataGal_tweets.csv'

I will be glade to resolve this. thanks

@m-ueberall.
Thank you very much for responding to my questions, I totally forgot to import that module. However, I ran into another problem.

FileNotFoundError Traceback (most recent call last)
in ()
56 if name == 'main':
57 #pass in the username of the account you want to download
---> 58 get_all_tweets("BigDataGal")

in get_all_tweets(screen_name)
46
47 #write the csv
---> 48 with open('../data/raw/svb_founders/%s_tweets.csv' % screen_name, 'w') as f:
49 writer = csv.writer(f)
50 writer.writerow(["id","created_at","text"])

FileNotFoundError: [Errno 2] No such file or directory: '../data/raw/svb_founders/BigDataGal_tweets.csv'

I will be glade to resolve this. thanks

@Smoops

This comment has been minimized.

Show comment
Hide comment
@Smoops

Smoops Jul 11, 2018

Is there a script to download tweets by searching on (smaller) sentences, because there is no hashtag or key word?

Smoops commented Jul 11, 2018

Is there a script to download tweets by searching on (smaller) sentences, because there is no hashtag or key word?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment