Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A script to download all of a user's tweets into a csv
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <https://unlicense.org>
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print(f"getting tweets before {oldest}")
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print(f"...{len(alltweets)} tweets downloaded so far")
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in alltweets]
#write the csv
with open(f'new_{screen_name}_tweets.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("J_tsar")
@brenorb
Copy link

brenorb commented Jun 6, 2019

@RobenV
Copy link

RobenV commented Jul 23, 2019

I am unable to figure out the input of user whose tweets are required. for example if want to get all the tweets of President Trump, how should i program the script to do so?

@asif3252
Copy link

asif3252 commented Aug 27, 2019

How download urdu news from twitter?
any one can help

@ogspeace
Copy link

ogspeace commented Sep 2, 2019

hi, i experienced an IndexError when downloading from accounts with less than 200 tweets. here's a tweak i did:

def get_all_tweets(screen_name):
    alltweets = []
    new_tweets = api.user_timeline(screen_name=screen_name,count=200)
    alltweets.extend(new_tweets)
    if len(alltweets) > 200: # greater than 200 tweets 
        oldest=alltweets[-1].id-1
        while len(new_tweets)>0:
            alltweets = []
            oldest=0 
            print("getting tweets before %s" %(oldest))
            new_tweets = api.user_timeline(screen_name=screen_name,count=200,max_id=oldest)#
            alltweets.extend(new_tweets)
            oldest=alltweets[-1].id-1
            print("...%s tweets downloaded so far" % (len(alltweets)))
        outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
    elif len(alltweets) > 1: # less than or eq to 200 tweets
        oldest=alltweets[-1].id-1
        outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
    else: # for 0-1 tweets
        oldest=0
        if len(alltweets) > 0: # for 1 tweet user
            outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]
        else: # for no tweet users
            outtweets=[]

best.

/ogs

@asif3252
Copy link

asif3252 commented Sep 2, 2019

@Hunkkhan
Copy link

Hunkkhan commented Sep 3, 2019

i tried executing it using editrocket[http://editrocket.com/download_win.html]
got following error
File "tweet_dumper.py", line 35
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax

You must be having older python
Put paranthesis in print

print ("getting tweets before %s" % (oldest))

@Hunkkhan
Copy link

Hunkkhan commented Sep 3, 2019

I am using ubuntu and I get this error

getting tweets before 1071996440803663872
...3081 tweets downloaded so far
Traceback (most recent call last):
File "Fetching.py", line 70, in
get_all_tweets("NomKnots")
File "Fetching.py", line 61, in get_all_tweets
writer = csv.writer(f)
NameError: name 'csv' is not defined

@ogspeace
Copy link

ogspeace commented Sep 3, 2019

I am using ubuntu and I get this error

getting tweets before 1071996440803663872
...3081 tweets downloaded so far
Traceback (most recent call last):
File "Fetching.py", line 70, in
get_all_tweets("NomKnots")
File "Fetching.py", line 61, in get_all_tweets
writer = csv.writer(f)
NameError: name 'csv' is not defined

try importing csv.

@asif3252
Copy link

asif3252 commented Sep 3, 2019

@Hunkkhan
Copy link

Hunkkhan commented Sep 6, 2019

How do I automate the same code ? I want to fetch screen names from a csv without me entering manually . Any idea ?

@heenashree
Copy link

heenashree commented Oct 23, 2019

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

This worked...thank you so much

@heenashree
Copy link

heenashree commented Oct 23, 2019

This code worked wonders :)

@prakashjha17
Copy link

prakashjha17 commented Nov 10, 2019

@david Yanofsky
when i tried writing the code and i got the below error::
for the
LOC-->new_tweets = api.user_timeline(screen_name = screen_name,count=20)
error::
NameError Traceback (most recent call last)
in
----> 1 new_tweets = api.user_timeline(screen_name = screen_name,count=20)

NameError: name 'screen_name' is not defined

  1. LOC::alltweets.extend(new_tweets)
    error
    NameError Traceback (most recent call last)
    in
    ----> 1 alltweets.extend(new_tweets)

NameError: name 'new_tweets' is not defined
3.LOC->oldest = alltweets[-1].id - 1
ERROR::
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range
4.LOC->while len(new_tweets) > 0:
ERROR
File "", line 2

^

SyntaxError: unexpected EOF while parsing

  1. print "getting tweets before %s" % (oldest)
    ERROR
    File "", line 1
    print "getting tweets before %s" % (oldest)
    ^
    SyntaxError: invalid syntax

6.oldest = alltweets[-1].id - 1
ERROR
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range

I am new to Python.
I have written exactly the same as you had mentioned.

Could you please help me solve the issue.

Thanks in advance.

Thanks,
Prakash Jha

@prakashjha17
Copy link

prakashjha17 commented Nov 10, 2019

This code worked wonders :)

when i tried writing the code and i got the below error::
for the
LOC-->new_tweets = api.user_timeline(screen_name = screen_name,count=20)
error::
NameError Traceback (most recent call last)
in
----> 1 new_tweets = api.user_timeline(screen_name = screen_name,count=20)

NameError: name 'screen_name' is not defined

LOC::alltweets.extend(new_tweets)
error
NameError Traceback (most recent call last)
in
----> 1 alltweets.extend(new_tweets)
NameError: name 'new_tweets' is not defined
3.LOC->oldest = alltweets[-1].id - 1
ERROR::
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range
4.LOC->while len(new_tweets) > 0:
ERROR
File "", line 2

^
SyntaxError: unexpected EOF while parsing

print "getting tweets before %s" % (oldest)
ERROR
File "", line 1
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax
6.oldest = alltweets[-1].id - 1
ERROR
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range

I am new to Python.
I have written exactly the same as you had mentioned.

Could you please help me solve the issue.

Thanks in advance.

Thanks,
Prakash Jha

@sonamgupta1105
Copy link

sonamgupta1105 commented Dec 19, 2019

@yanofsky Thanks for writing this code. Helped me to start learning API and building dataset with it. Do you know any way I can filter the tweets by particular hashtags ?

@ParthS28
Copy link

ParthS28 commented Feb 28, 2020

Hello, I am getting this error. Can anyone help?

TypeError                                 Traceback (most recent call last)
<ipython-input-25-6f34111d251b> in <module>
     49 if __name__ == '__main__':
     50         #pass in the username of the account you want to download
---> 51         get_all_tweets("realDonaldTrump")

<ipython-input-25-6f34111d251b> in get_all_tweets(screen_name)
     41         with open('%s_tweets.csv' % screen_name, 'wb') as f:
     42                 writer = csv.writer(f)
---> 43                 writer.writerow(['id','created_at','text'])
     44                 writer.writerows(outtweets)
     45 

TypeError: a bytes-like object is required, not 'str'

@kevinSJ27
Copy link

kevinSJ27 commented Mar 10, 2020

@ParthS28 on line 41 remove the following:
41 with open('%s_tweets.csv' % screen_name, 'wb') as f:
41 with open('%s_tweets.csv' % screen_name, 'w') as f:
remove the b at the end and it should work

@pratikone
Copy link

pratikone commented Mar 17, 2020

I have created a modified version which fetches the tweets and creates tweet threads out of it
https://gist.github.com/pratikone/4cdd5b1149aef0418611eb8748d90ee9

@musahibrahimali
Copy link

musahibrahimali commented Apr 1, 2020

i am getting this error , can anyone help

Traceback (most recent call last):
File "C:/Users/MUSAH IBRAHIM ALI/PycharmProjects/Election Prediction/test.py", line 61, in
get_all_tweets("NAkufoAddo")
File "C:/Users/MUSAH IBRAHIM ALI/PycharmProjects/Election Prediction/test.py", line 53, in get_all_tweets
writer.writerow(["id", "created_at", "text"])
TypeError: a bytes-like object is required, not 'str'

@onmyeoin
Copy link

onmyeoin commented Apr 6, 2020

@jefische
Copy link

jefische commented Aug 16, 2020

The following terminal output and errors are repeated several times when I execute the code (though I've only pasted one iteration below). Not sure what the issue is but mentions certificate failures and something about max retries - any ideas?

Terminal Output:

PS C:\Users\jefischer\Documents\My_Projects\Thinkorswim> & C:/Users/jefischer/AppData/Local/Programs/Python/Python38/python.exe c:/Users/jefischer/Documents/My_Projects/Thinkorswim/tweet_dumper.py
Traceback (most recent call last):
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 978, in validate_conn
conn.connect()
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 362, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\ssl
.py", line 384, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/statuses/user_timeline.json?screen_name=unusual_whales&count=200 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)')))

@kcalderw79
Copy link

kcalderw79 commented Nov 3, 2020

Is it possible to put in multiple screennames to get all of the tweets sorted by username in one csv?

@musahibrahimali
Copy link

musahibrahimali commented Nov 4, 2020

@kcalderw79 : Yes its possible to do this in one script file to extract data from multiple screen names to one csv file. check it out on my project on here https://github.com/MIA-GH/Elections/blob/master/scripts/main.py .
you can insert the screen names in the users array on line 47 of the main.py file above.
This same script allows you to also extract data using certain keywords. you can insert these keywords (hashtags) in the array on line 25 in the terms array.

cheers mate.

just a heads up don't forget to insert your twitter here https://github.com/MIA-GH/Elections/blob/master/scripts/twitter_credentials.py before running the screen

@JunaidAWahid
Copy link

JunaidAWahid commented Dec 18, 2020

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

add include_rts='false', exclude_replies='true', in user.timeline in line 39

@mindyng
Copy link

mindyng commented Jan 5, 2021

@kcalderw79 : Yes its possible to do this in one script file to extract data from multiple screen names to one csv file. check it out on my project on here https://github.com/MIA-GH/Elections/blob/master/scripts/main.py .
you can insert the screen names in the users array on line 47 of the main.py file above.
This same script allows you to also extract data using certain keywords. you can insert these keywords (hashtags) in the array on line 25 in the terms array.

cheers mate.

just a heads up don't forget to insert your twitter here https://github.com/MIA-GH/Elections/blob/master/scripts/twitter_credentials.py before running the screen

^ This worked for me. I love it because it creates such a rich dataset: multiple users and multiple KW's/hashtags pulled! Only edit I made was pasting Twitter API credentials straight into the script. So no need for: from scripts import twitter_credentials as api. Though the way that it is originally set up helps with quick script transfer across the web. Thanks, @MIA-GH!

@likeablegeek
Copy link

likeablegeek commented Apr 1, 2021

Hi @yanofsky ...

What license are you distributing this code with? Do you have any objections to this code being used/extended in a project which is being shared under the Apache 2.0 license?

Thanks.

@yanofsky
Copy link
Author

yanofsky commented Apr 1, 2021

@likeablegeek, I added a License file.

@likeablegeek
Copy link

likeablegeek commented Apr 1, 2021

@likeablegeek, I added a License file.

Thanks.

@JayJay-101
Copy link

JayJay-101 commented May 8, 2021

works like charm, just had to pass encoding parameter with value utf-8 at last block,

@alessandromonolo
Copy link

alessandromonolo commented Aug 4, 2021

works like charm, just had to pass encoding parameter with value utf-8 at last block,

which line of code? can you please copy-paste your list block of code?
I had the same error problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment