Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A Python script to download all the tweets of a hashtag into a csv
import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
#####United Airlines
# Open/Create a file to append data
csvFile = open('ua.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#unitedAIRLINES",count=100,
lang="en",
since="2017-04-03").items():
print (tweet.created_at, tweet.text)
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
@bellegis

This comment has been minimized.

Copy link

commented Aug 29, 2017

thank you!

@vitospinelli

This comment has been minimized.

Copy link

commented Jan 3, 2018

When I run this script on Python 27 (on a windows 10 machine) nothing happens and no error is returned... can you please help?

@impshum

This comment has been minimized.

Copy link

commented Jan 30, 2018

@vitospinelli When you see print(things) not print things wrapped in parentheses you're dealing with python 3. Drop 2 unless you really have to use it. You have 2 years to get used to 3: https://pythonclock.org

@sushovan1

This comment has been minimized.

Copy link

commented Mar 21, 2018

I tried using the same code on 2018-03-21 and was thinking to fetch tweet as old as 2018-02-01 but it was unable to return those many tweets, any idea why?

@streetratonascooter

This comment has been minimized.

Copy link

commented Mar 27, 2018

@sushonvan1 the twitter API only lets you go back approximately 2 weeks

@shruti18196

This comment has been minimized.

Copy link

commented May 4, 2018

Thanku very much

@qrnazyhah

This comment has been minimized.

Copy link

commented May 10, 2018

i want to fetch tweet as old as 2018-01-01, can you help me please?

@kamalikap

This comment has been minimized.

Copy link

commented May 23, 2018

Hi, I want to extract the hashtags from the tweets and store it into a file. is it possible?

@kahiin

This comment has been minimized.

Copy link

commented May 24, 2018

Hi, i want to save the tweets that i obtain into an array, is it possible?! thanks

@carlvlewis

This comment has been minimized.

Copy link

commented Jun 3, 2018

Running this script w/ Python 3.6, it's working just fine, outputting the data and creating the CSV file, but the CSV file appears empty when I open it. Any ideas?

@4lexLammers

This comment has been minimized.

Copy link

commented Jun 11, 2018

Thanks, the script runs fine on Python 3.5.2. I just would add a .encode('utf-8') in the print command on line 22. Otherwise I got an error when printing tweets to the console.

@jculligan

This comment has been minimized.

Copy link

commented Jun 13, 2018

How would I go about adding in location(e.g. geo_id or coordinates) and user_id? I've been going through the Tweepy documentation and Twitter API documentation, but can't find any information to add arguments like tweet.text and tweet.created_at.

Update: After some more digging, I managed to find this output of the json file to find which arguments can be called for information: https://gist.github.com/dev-techmoe/ef676cdd03ac47ac503e856282077bf2

So, I learned that I can call geo, place, and coordinates (tweet.geo, tweet.place, tweet.coordinates), but it doesn't appear to do well for historical data. I maybe pulled 3 out of several thousand so far :/

But it's a handy reference for things like tweet.user.id or tweet.user.screen_name!

I'm still looking for a way to determine if the tweet is a retweet (tweets I'd like to remove in my analysis), but besides the tweet.text beginning with "b'RT @" or if it is an advertisement (e.g. 'Buy 3 for 2 promotion' kinda thing). If anyone has any advice on those, I'd be greatly appreciative!

@rglukins

This comment has been minimized.

Copy link

commented Jun 13, 2018

@carlvlewis I had the same problem initially. I changed line 15 from append ('a') to write ('w') and it works just fine.

@Balajigentela

This comment has been minimized.

Copy link

commented Jun 18, 2018

it working fine...but no data is stored in the file....
please help me...

@acapshaw

This comment has been minimized.

Copy link

commented Jun 25, 2018

having the same issue, nothing is being stored in csv, csnt seem to find the issue

@chuchovalbuena

This comment has been minimized.

Copy link

commented Jun 26, 2018

Thank you so much!

@GabrielDvt

This comment has been minimized.

Copy link

commented Jul 16, 2018

Hello.

I'm geting this error:

File "C:\Users\Gabriel\Desktop\tweet\crawling.py", line 22, in
print (tweet.created_at, tweet.text)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 46-46: Non-BMP character not supported in Tk

Can someone help me?

@ogheneinfinitea

This comment has been minimized.

Copy link

commented Jul 20, 2018

hello, pls how do i get the number of times a user posted a tweet using a particular hashtag

@arfkrnwan

This comment has been minimized.

Copy link

commented Jul 21, 2018

if I want to add another attribute for streaming results, where I can see the tutorial. thank you

@rogeriocolonna

This comment has been minimized.

Copy link

commented Sep 4, 2018

Thank you! Works fine!

@arushiyadav

This comment has been minimized.

Copy link

commented Oct 1, 2018

Im getting error
TweepError Traceback (most recent call last)
in ()
18
19 for tweet in tweepy.Cursor(api.search,q="#indianairways",count=100,
---> 20 lang="en").items():
21 print (tweet.created_at, tweet.text)
22 csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
47
48 def next(self):
---> 49 return self.next()
50
51 def next(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
195 if self.current_page is None or self.page_index == len(self.current_page) - 1:
196 # Reached end of current page, get the next page...
--> 197 self.current_page = self.page_iterator.next()
198 self.page_index = -1
199 self.page_index += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
106
107 if self.index >= len(self.results) - 1:
--> 108 data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
109
110 if hasattr(self.method, 'self'):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\binder.py in _call(*args, **kwargs)
248 return method
249 else:
--> 250 return method.execute()
251
252 # Set pagination mode

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tweepy\binder.py in execute(self)
232 raise RateLimitError(error_msg, resp)
233 else:
--> 234 raise TweepError(error_msg, resp, api_code=api_error_code)
235
236 # Parse the response payload

TweepError: Twitter error response: status code = 401
plz help me with this

@navneetkaur08

This comment has been minimized.

Copy link

commented Oct 7, 2018

In line 23, instead of CsvWriter it should be csvFile.

@sittinurul

This comment has been minimized.

Copy link

commented Dec 13, 2018

      it working fine...but no data is stored in the file....

please help me...

@Balajigentela you should run in cmd

@humbleself

This comment has been minimized.

Copy link

commented Feb 8, 2019

please help me, is giving me this error
TweepError Traceback (most recent call last)
in ()
1 for tweet in tweepy.Cursor(api.search,q="#APC",count=100,
2 lang="en",
----> 3 since="2017-04-03").items():
4 print (tweet.created_at, tweet.text)

D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
47
48 def next(self):
---> 49 return self.next()
50
51 def next(self):

D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
195 if self.current_page is None or self.page_index == len(self.current_page) - 1:
196 # Reached end of current page, get the next page...
--> 197 self.current_page = self.page_iterator.next()
198 self.page_index = -1
199 self.page_index += 1
D:\anaconda3\lib\site-packages\tweepy\cursor.py in next(self)
106
107 if self.index >= len(self.results) - 1:
--> 108 data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
109
110 if hasattr(self.method, 'self'):
D:\anaconda3\lib\site-packages\tweepy\binder.py in _call(*args, **kwargs)
248 return method
249 else:
--> 250 return method.execute()
251
252 # Set pagination mode

D:\anaconda3\lib\site-packages\tweepy\binder.py in execute(self)
232 raise RateLimitError(error_msg, resp)
233 else:
--> 234 raise TweepError(error_msg, resp, api_code=api_error_code)
236 # Parse the response payload
TweepError: Twitter error response: status code = 400

@jodmoreira

This comment has been minimized.

Copy link

commented Feb 12, 2019

It works fine for me! Thanks!

@Prashant-PS

This comment has been minimized.

Copy link

commented Feb 16, 2019

@humbleself

Regenerate your API Keys and Token. it will fine

@ElTheCatto

This comment has been minimized.

Copy link

commented Mar 19, 2019

I am having trouble with this code as it is spitting out an error that I am not sure how to solve:
``traceback (most recent call last):
File "c:/Users/Evdru/Documents/School Work/Science/Program Files/main.py", line 1, in
import tweepy
File "C:\Users\Evdru\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy_init_.py", line 17, in from tweepy.streaming import Stream, StreamListener
File "C:\Users\Evdru\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy\streaming.py", line 355
def _start(self, async): ^
SyntaxError: invalid syntax`

@NitBrok

This comment has been minimized.

Copy link

commented Apr 16, 2019

what if i want to extract tweets from more number of hashtags? should i pass list of hashtags to 'q' attribute of tweepy.cursor?

@saiplanner

This comment has been minimized.

Copy link

commented May 7, 2019

how to search for multiple hashtags?

@iampotential

This comment has been minimized.

Copy link

commented May 21, 2019

thank you thank you thank you!

@geethuth

This comment has been minimized.

Copy link

commented May 28, 2019

Please help me.. I am getting thefollowing error..

C:\Users\geethu\LangDetect\Scripts\python.exe "C:/Users/geethu/PycharmProjects/Landslip_langDetection_Final/twitter crawler.py"
Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 839, in validate_conn
conn.connect()
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connection.py", line 344, in connect
ssl_context=context)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\util\ssl
.py", line 344, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 412, in wrap_socket
session=session
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 853, in _create
self.do_handshake()
File "C:\Users\geethu\AppData\Local\Programs\Python\Python37-32\lib\ssl.py", line 1117, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1039: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\util\retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
raise value
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "C:\Users\geethu\LangDetect\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 190, in execute
proxies=self.api.proxy)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/geethu/PycharmProjects/Landslip_langDetection_Final/twitter crawler.py", line 21, in
since="2017-04-03").items():
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 49, in next
return self.next()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 197, in next
self.current_page = self.page_iterator.next()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\cursor.py", line 108, in next
data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 250, in _call
return method.execute()
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 192, in execute
six.reraise(TweepError, TweepError('Failed to send request: %s' % e), sys.exc_info()[2])
File "C:\Users\geethu\LangDetect\lib\site-packages\six.py", line 692, in reraise
raise value.with_traceback(tb)
File "C:\Users\geethu\LangDetect\lib\site-packages\tweepy\binder.py", line 190, in execute
proxies=self.api.proxy)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\geethu\LangDetect\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
tweepy.error.TweepError: Failed to send request: HTTPSConnectionPool(host='api.twitter.com', port=443): Read timed out. (read timeout=60)

Process finished with exit code 1

@sxshateri

This comment has been minimized.

Copy link

commented Jun 9, 2019

i couldn't find "since" parameter in tweepy documents. would you mind to explain and give me a link to the corresponding document?

@PranjalShekhawat

This comment has been minimized.

Copy link

commented Jun 11, 2019

ua.csv is an empty file ...can you please guide on how to get the data in ua.csv
thanks a lot in advance please do help !!

@steeley

This comment has been minimized.

Copy link

commented Jun 13, 2019

seems to down load tweets, but most appear to be chopped off so you don't get the full tweet. Not very useful.

@sxshateri

This comment has been minimized.

Copy link

commented Jun 13, 2019

seems to down load tweets, but most appear to be chopped off so you don't get the full tweet. Not very useful.

@ateeley , you may use my code which uses the extended twitter feature and you can get the full tweet.

https://gist.github.com/sxshateri/540aead254bfa7810ee8bbb2d298363e

@steeley

This comment has been minimized.

Copy link

commented Jun 14, 2019

thanks sxshateri, seems to work ok with python3 in virtuelenv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.