Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A Python script to download all the tweets of a hashtag into a csv
import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
#####United Airlines
# Open/Create a file to append data
csvFile = open('ua.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#unitedAIRLINES",count=100,
lang="en",
since="2017-04-03").items():
print (tweet.created_at, tweet.text)
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
@bellegis

This comment has been minimized.

Copy link

@bellegis bellegis commented Aug 29, 2017

thank you!

@vitospinelli

This comment has been minimized.

Copy link

@vitospinelli vitospinelli commented Jan 3, 2018

When I run this script on Python 27 (on a windows 10 machine) nothing happens and no error is returned... can you please help?

@impshum

This comment has been minimized.

Copy link

@impshum impshum commented Jan 30, 2018

@vitospinelli When you see print(things) not print things wrapped in parentheses you're dealing with python 3. Drop 2 unless you really have to use it. You have 2 years to get used to 3: https://pythonclock.org

@sushovan1

This comment has been minimized.

Copy link

@sushovan1 sushovan1 commented Mar 21, 2018

I tried using the same code on 2018-03-21 and was thinking to fetch tweet as old as 2018-02-01 but it was unable to return those many tweets, any idea why?

@streetratonascooter

This comment has been minimized.

Copy link

@streetratonascooter streetratonascooter commented Mar 27, 2018

@sushonvan1 the twitter API only lets you go back approximately 2 weeks

@shruti18196

This comment has been minimized.

Copy link

@shruti18196 shruti18196 commented May 4, 2018

Thanku very much

@qrnazyhah

This comment has been minimized.

Copy link

@qrnazyhah qrnazyhah commented May 10, 2018

i want to fetch tweet as old as 2018-01-01, can you help me please?

@kamalikap

This comment has been minimized.

Copy link

@kamalikap kamalikap commented May 23, 2018

Hi, I want to extract the hashtags from the tweets and store it into a file. is it possible?

@kahiin

This comment has been minimized.

Copy link

@kahiin kahiin commented May 24, 2018

Hi, i want to save the tweets that i obtain into an array, is it possible?! thanks

@carlvlewis

This comment has been minimized.

Copy link

@carlvlewis carlvlewis commented Jun 3, 2018

Running this script w/ Python 3.6, it's working just fine, outputting the data and creating the CSV file, but the CSV file appears empty when I open it. Any ideas?

@4lexLammers

This comment has been minimized.

Copy link

@4lexLammers 4lexLammers commented Jun 11, 2018

Thanks, the script runs fine on Python 3.5.2. I just would add a .encode('utf-8') in the print command on line 22. Otherwise I got an error when printing tweets to the console.

@jculligan

This comment has been minimized.

Copy link

@jculligan jculligan commented Jun 13, 2018

How would I go about adding in location(e.g. geo_id or coordinates) and user_id? I've been going through the Tweepy documentation and Twitter API documentation, but can't find any information to add arguments like tweet.text and tweet.created_at.

Update: After some more digging, I managed to find this output of the json file to find which arguments can be called for information: https://gist.github.com/dev-techmoe/ef676cdd03ac47ac503e856282077bf2

So, I learned that I can call geo, place, and coordinates (tweet.geo, tweet.place, tweet.coordinates), but it doesn't appear to do well for historical data. I maybe pulled 3 out of several thousand so far :/

But it's a handy reference for things like tweet.user.id or tweet.user.screen_name!

I'm still looking for a way to determine if the tweet is a retweet (tweets I'd like to remove in my analysis), but besides the tweet.text beginning with "b'RT @" or if it is an advertisement (e.g. 'Buy 3 for 2 promotion' kinda thing). If anyone has any advice on those, I'd be greatly appreciative!

@rglukins

This comment has been minimized.

Copy link

@rglukins rglukins commented Jun 13, 2018

@carlvlewis I had the same problem initially. I changed line 15 from append ('a') to write ('w') and it works just fine.

@Balajigentela

This comment has been minimized.

Copy link

@Balajigentela Balajigentela commented Jun 18, 2018

it working fine...but no data is stored in the file....
please help me...

@acapshaw

This comment has been minimized.

Copy link

@acapshaw acapshaw commented Jun 25, 2018

having the same issue, nothing is being stored in csv, csnt seem to find the issue

@chuchovalbuena

This comment has been minimized.

Copy link

@chuchovalbuena chuchovalbuena commented Jun 26, 2018

Thank you so much!

@GabrielDvt

This comment has been minimized.

Copy link

@GabrielDvt GabrielDvt commented Jul 16, 2018

Hello.

I'm geting this error:

File "C:\Users\Gabriel\Desktop\tweet\crawling.py", line 22, in
print (tweet.created_at, tweet.text)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 46-46: Non-BMP character not supported in Tk

Can someone help me?

@ogheneinfinitea

This comment has been minimized.

Copy link

@ogheneinfinitea ogheneinfinitea commented Jul 20, 2018

hello, pls how do i get the number of times a user posted a tweet using a particular hashtag

@arfkrnwan

This comment has been minimized.

Copy link

@arfkrnwan arfkrnwan commented Jul 21, 2018

if I want to add another attribute for streaming results, where I can see the tutorial. thank you

@rogeriocolonna

This comment has been minimized.

Copy link

@rogeriocolonna rogeriocolonna commented Sep 4, 2018

Thank you! Works fine!

@arushiyadav

This comment has been minimized.

Copy link

@arushiyadav arushiyadav commented Oct 1, 2018

Im getting error
TweepError Traceback (most recent call last)
in ()
18
19 for tweet in tweepy.Cursor(api.search,q="#indianairways",count=100,
---> 20 lang="en").items():
21 print (tweet.created_at, tweet.text)
22 csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
47
48 def next(self):
---> 49 return self.next()
50
51 def next(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
195 if self.current_page is None or self.page_index == len(self.current_page) - 1:
196 # Reached end of current page, get the next page...
--> 197 self.current_page = self.page_iterator.next()
198 self.page_index = -1
199 self.page_index += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
106
107 if self.index >= len(self.results) - 1:
--> 108 data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
109
110 if hasattr(self.method, 'self'):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\binder.py in _call(*args, **kwargs)
248 return method
249 else:
--> 250 return method.execute()
251
252 # Set pagination mode

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\binder.py in execute(self)
232 raise RateLimitError(error_msg, resp)
233 else:
--> 234 raise TweepError(error_msg, resp, api_code=api_error_code)
235
236 # Parse the response payload

TweepError: Twitter error response: status code = 401
plz help me with this

@navneetkaur08

This comment has been minimized.

Copy link

@navneetkaur08 navneetkaur08 commented Oct 7, 2018

In line 23, instead of CsvWriter it should be csvFile.

@sittinurul

This comment has been minimized.

Copy link

@sittinurul sittinurul commented Dec 13, 2018

      it working fine...but no data is stored in the file....

please help me...

@Balajigentela you should run in cmd

@humbleself

This comment has been minimized.

Copy link

@humbleself humbleself commented Feb 8, 2019

please help me, is giving me this error
TweepError Traceback (most recent call last)
in ()
1 for tweet in tweepy.Cursor(api.search,q="#APC",count=100,
2 lang="en",
----> 3 since="2017-04-03").items():
4 print (tweet.created_at, tweet.text)

D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
47
48 def next(self):
---> 49 return self.next()
50
51 def next(self):

D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
195 if self.current_page is None or self.page_index == len(self.current_page) - 1:
196 # Reached end of current page, get the next page...
--> 197 self.current_page = self.page_iterator.next()
198 self.page_index = -1
199 self.page_index += 1
D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
106
107 if self.index >= len(self.results) - 1:
--> 108 data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
109
110 if hasattr(self.method, 'self'):
D:\anaconda3\lib\site-packages\tweepy\binder.py in _call(*args, **kwargs)
248 return method
249 else:
--> 250 return method.execute()
251
252 # Set pagination mode

D:\anaconda3\lib\site-packages\tweepy\binder.py in execute(self)
232 raise RateLimitError(error_msg, resp)
233 else:
--> 234 raise TweepError(error_msg, resp, api_code=api_error_code)
236 # Parse the response payload
TweepError: Twitter error response: status code = 400

@jodmoreira

This comment has been minimized.

Copy link

@jodmoreira jodmoreira commented Feb 12, 2019

It works fine for me! Thanks!

@Prashant-PS

This comment has been minimized.

Copy link

@Prashant-PS Prashant-PS commented Feb 16, 2019

@humbleself

Regenerate your API Keys and Token. it will fine

@ElTheCatto

This comment has been minimized.

Copy link

@ElTheCatto ElTheCatto commented Mar 19, 2019

I am having trouble with this code as it is spitting out an error that I am not sure how to solve:
``traceback (most recent call last):
File "c:/Users/Evdru/Documents/School Work/Science/Program Files/main.py", line 1, in
import tweepy
File "C:\Users\Evdru\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy_init_.py", line 17, in from tweepy.streaming import Stream, StreamListener
File "C:\Users\Evdru\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy\streaming.py", line 355
def _start(self, async): ^
SyntaxError: invalid syntax`

@NitBrok

This comment has been minimized.

Copy link

@NitBrok NitBrok commented Apr 16, 2019

what if i want to extract tweets from more number of hashtags? should i pass list of hashtags to 'q' attribute of tweepy.cursor?

@saiplanner

This comment has been minimized.

Copy link

@saiplanner saiplanner commented May 7, 2019

how to search for multiple hashtags?

@iampotential

This comment has been minimized.

Copy link

@iampotential iampotential commented May 21, 2019

thank you thank you thank you!

@geethuth

This comment has been minimized.

Copy link

@geethuth geethuth commented May 28, 2019

Please help me.. I am getting thefollowing error..

C:\Users\geethu\LangDetect\Scripts\python.exe "C:/Users/geethu/PycharmProjects/Landslip_langDetection_Final/twitter crawler.py"
Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 839, in validate_conn
conn.connect()
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connection.py", line 344, in connect
ssl_context=context)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\util\ssl
.py", line 344, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 412, in wrap_socket
session=session
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 853, in _create
self.do_handshake()
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 1117, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1039: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\util\retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
raise value
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 190, in execute
proxies=self.api.proxy)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/geethu/PycharmProjects/Landslip_langDetection_Final/twitter crawler.py", line 21, in
since="2017-04-03").items():
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 49, in next
return self.next()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 197, in next
self.current_page = self.page_iterator.next()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 108, in next
data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 250, in _call
return method.execute()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 192, in execute
six.reraise(TweepError, TweepError('Failed to send request: %s' % e), sys.exc_info()[2])
File "C:\Users\geethu\LangDetect\lib\site-packages\six.py", line 692, in reraise
raise value.with_traceback(tb)
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 190, in execute
proxies=self.api.proxy)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
tweepy.error.TweepError: Failed to send request: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

Process finished with exit code 1

@sxshateri

This comment has been minimized.

Copy link

@sxshateri sxshateri commented Jun 9, 2019

i couldn't find "since" parameter in tweepy documents. would you mind to explain and give me a link to the corresponding document?

@PranjalShekhawat

This comment has been minimized.

Copy link

@PranjalShekhawat PranjalShekhawat commented Jun 11, 2019

ua.csv is an empty file ...can you please guide on how to get the data in ua.csv
thanks a lot in advance please do help !!

@steeley

This comment has been minimized.

Copy link

@steeley steeley commented Jun 13, 2019

seems to down load tweets, but most appear to be chopped off so you don't get the full tweet. Not very useful.

@sxshateri

This comment has been minimized.

Copy link

@sxshateri sxshateri commented Jun 13, 2019

seems to down load tweets, but most appear to be chopped off so you don't get the full tweet. Not very useful.

@ateeley , you may use my code which uses the extended twitter feature and you can get the full tweet.

https://gist.github.com/sxshateri/540aead254bfa7810ee8bbb2d298363e

@steeley

This comment has been minimized.

Copy link

@steeley steeley commented Jun 14, 2019

thanks sxshateri, seems to work ok with python3 in virtuelenv.

@CuriouslyYours

This comment has been minimized.

Copy link

@CuriouslyYours CuriouslyYours commented Aug 25, 2019

Hi, Thanks for this code. I am a beginner, learning python, so please bear with my list of questions.
1> how can get user gender also and store the dataset in a data-frame for analysis, like which gender tweeted most on a hashtag?
2>Also, on a particular hashtag I am getting very less data than expected, is it limiting data by twitter? (for one keyword I have got 2K records and for another got 17K). how can I ensure full record download?
3>there is an extra blank line appearing before each line of results, how to avoid it?
4> File gets locked and opens in only read-only format, (processing is done). how to remove the lock?

@ChiragGoyal98

This comment has been minimized.

Copy link

@ChiragGoyal98 ChiragGoyal98 commented Sep 14, 2019

@humbleself

Regenerate your API Keys and Token. it will fine

How do i do that

@Tafura629

This comment has been minimized.

Copy link

@Tafura629 Tafura629 commented Oct 22, 2019

Hi, i want to save the tweets that i obtain into an array, is it possible?! thanks

yes it is possible

@dmug1

This comment has been minimized.

Copy link

@dmug1 dmug1 commented Oct 27, 2019

Thank you, this is really valuable, i was going thru another path but this is shows that it can be done in a simple way.
thanks

@abhishek-negi

This comment has been minimized.

Copy link

@abhishek-negi abhishek-negi commented Feb 6, 2020

Great work!!
I have a question- will it return any response if there is one only tweet which is dated Feb 19, 2015? For hashtag- LexusEnformRemote
I have read Tweepy only return tweets for the past 7 days.

One more thing, how to search for multiple hashtags in a single loop?

@abhishek-negi

This comment has been minimized.

Copy link

@abhishek-negi abhishek-negi commented Feb 7, 2020

Great work!!
I have a question- will it return any response if there is one only tweet which is dated Feb 19, 2015? For hashtag- LexusEnformRemote
I have read Tweepy only return tweets for the past 7 days.

One more thing, how to search for multiple hashtags in a single loop?

Answering my own question of getting tweets beyond the past 7 days. I used GetOldTweets3 library by @Jefferson-Henrique and it's working fine for me. We can get tweets by multiple user accounts in one go but search doesn't take two hashtags.
Link

@Geethanjali8989

This comment has been minimized.

Copy link

@Geethanjali8989 Geethanjali8989 commented Mar 25, 2020

it working fine...but no data is stored in the file....
please help me...

hey i am also getting the same error did u sove that

@barthuin

This comment has been minimized.

Copy link

@barthuin barthuin commented Mar 31, 2020

it working fine...but no data is stored in the file....
please help me...

hey i am also getting the same error did u sove that

Hello,

It's my first time with python and this library and I have a strange error. In my query, i don't have all the tweets, only the 25-30 last tweets. But if I do the same search in Twitter, I have a lot of tweets in the results. Anyone knows why?

@this-is-shashank

This comment has been minimized.

Copy link

@this-is-shashank this-is-shashank commented Apr 18, 2020

Running this script w/ Python 3.6, it's working just fine, outputting the data and creating the CSV file, but the CSV file appears empty when I open it. Any ideas?

I am also facing same issues. Can you tell me how you fixed it?

@this-is-shashank

This comment has been minimized.

Copy link

@this-is-shashank this-is-shashank commented Apr 18, 2020

ua.csv is an empty file ...can you please guide on how to get the data in ua.csv
thanks a lot in advance please do help !!

I am also facing same issues. Can you tell me how you fixed it?

@quintendewilde

This comment has been minimized.

Copy link

@quintendewilde quintendewilde commented Apr 27, 2020

Hi!

Is it possible to make this "live" such as everytime a tweet is posted with a #cat it fetches it? Or that the code just looks a the last minute every other minute?

I hope this is clear?

@Stellaxu19

This comment has been minimized.

Copy link

@Stellaxu19 Stellaxu19 commented May 15, 2020

Can anyone help me find a way to obtain these data for longer period of time? such as using gotoldtweets3? Thank you very much.

@ojkoort

This comment has been minimized.

Copy link

@ojkoort ojkoort commented Jul 14, 2020

just signed up to github to say thank you

@vedantkalan777

This comment has been minimized.

Copy link

@vedantkalan777 vedantkalan777 commented Aug 9, 2020

Hello Thank You for the above code, I wanted to download data by hashtag and the location. For example - tweets with "#coronavirus" and the locations of the tweets "India" and the tweets time limit from "2020-04-01" till "2020-04-31", can anyone please help me for that, It is really important for my research.

@Harish-uthravalli

This comment has been minimized.

Copy link

@Harish-uthravalli Harish-uthravalli commented Aug 25, 2020

How to retrieve the name of the user who tweeted?

@AmbiTyga

This comment has been minimized.

Copy link

@AmbiTyga AmbiTyga commented Sep 15, 2020

How can I get tweets containing only keywords I mentioned independent of position, like I tried stream.filter(track=["good","bank","@NYC"]), I used these keywords in the code but sometimes I am getting tweets containing either 1,2 or 3 of them together. I want to get tweets containing 3 of these keywords, independent of the position.

@mandarthosar

This comment has been minimized.

Copy link

@mandarthosar mandarthosar commented Sep 29, 2020

I am a novice here. The code didn't work for me first. However, it worked after I added one line to the end - csvFile.close()
Not sure if I am the only one who faced it or it is such a trivial problem that all others have fixed it already in their own codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.