Skip to content

Instantly share code, notes, and snippets.

@bonzanini
Last active January 9, 2024 14:11
Star You must be signed in to star a gist
Save bonzanini/af0463b927433c73784d to your computer and use it in GitHub Desktop.
Twitter Stream Downloader
consumer_key = 'your-consumer-key'
consumer_secret = 'your-consumer-secret'
access_token = 'your-access-token'
access_secret = 'your-access-secret'
# To run this code, first edit config.py with your configuration, then:
#
# mkdir data
# python twitter_stream_download.py -q apple -d data
#
# It will produce the list of tweets for the query "apple"
# in the file data/stream_apple.json
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import argparse
import string
import config
import json
def get_parser():
"""Get parser for command line arguments."""
parser = argparse.ArgumentParser(description="Twitter Downloader")
parser.add_argument("-q",
"--query",
dest="query",
help="Query/Filter",
default='-')
parser.add_argument("-d",
"--data-dir",
dest="data_dir",
help="Output/Data Directory")
return parser
class MyListener(StreamListener):
"""Custom StreamListener for streaming data."""
def __init__(self, data_dir, query):
query_fname = format_filename(query)
self.outfile = "%s/stream_%s.json" % (data_dir, query_fname)
def on_data(self, data):
try:
with open(self.outfile, 'a') as f:
f.write(data)
print(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
time.sleep(5)
return True
def on_error(self, status):
print(status)
return True
def format_filename(fname):
"""Convert file name into a safe string.
Arguments:
fname -- the file name to convert
Return:
String -- converted file name
"""
return ''.join(convert_valid(one_char) for one_char in fname)
def convert_valid(one_char):
"""Convert a character into '_' if invalid.
Arguments:
one_char -- the char to convert
Return:
Character -- converted char
"""
valid_chars = "-_.%s%s" % (string.ascii_letters, string.digits)
if one_char in valid_chars:
return one_char
else:
return '_'
@classmethod
def parse(cls, api, raw):
status = cls.first_parse(api, raw)
setattr(status, 'json', json.dumps(raw))
return status
if __name__ == '__main__':
parser = get_parser()
args = parser.parse_args()
auth = OAuthHandler(config.consumer_key, config.consumer_secret)
auth.set_access_token(config.access_token, config.access_secret)
api = tweepy.API(auth)
twitter_stream = Stream(auth, MyListener(args.data_dir, args.query))
twitter_stream.filter(track=[args.query])
@mg3146
Copy link

mg3146 commented Jun 5, 2016

@bonzanini. This is a great piece of code, and very helpful to learn from. Thanks a lot.

Quick question - has there been any updates to the API that allow for tracking multiple words? Ie, "game tomorrow", vs "game" and "tomorrow" (which would result in a ton more data and postprocessing...)

@bonzanini
Copy link
Author

@markgillis0 unfortunately exact phrase matching is not supported by the twitter streaming API yet: https://dev.twitter.com/streaming/overview/request-parameters#track
on the other side, it is supported by the search API

@shannonwho
Copy link

Hi! Thank you very much for sharing.

The code works fine when I input the query for apple, but no other keyword can be input in. Do you happen to know why is that?

Any suggestions will be really helpful!

@ajax-jones
Copy link

I find that the 401 is what you get before you set up your config.py with the twitter app credentials.I get the none error if the -d is not specified. So I create a sub-dir and use that and it works fine then
sudo mkdir mydir
sudo python tweet.py -q apple -d mydir

@Parth-Vader
Copy link

If I want to store just the "text" portion , how can I do it?

@kmrsatish17
Copy link

I'm getting this error. Please help!!
Error on_data: [Errno 2] No such file or directory: 'data/stream_apple.json'

@kjoth
Copy link

kjoth commented Dec 4, 2016

How do I get to list of my followers?

for friends in tweepy.Cursor(api.followers).items():
fw.write('Friends: ' + str(follower_ids) + "\n")

follower_ids is not found

@rsathishr
Copy link

Am getting an error!! pls help me out

Failed on data: %s '_io.TextIOWrapper' object has no attribute 'Write'
ERROR: execution aborted

@GabrielYe
Copy link

GabrielYe commented Mar 14, 2017

@bonzanini Thanks for your code. How can I get all of the tweets of a specific user ? For example, I wanna get tweets of Kobe.
Thank you.

@yuchenQ
Copy link

yuchenQ commented Mar 19, 2017

@bonzanini Hi thanks for you great example, may I ask what use of
@classmethod
def parse(cls, api, raw):

thks

@baoyanpeng
Copy link

Thanks a lot,and i have a question. Whether can i obtain the data about some keywords before today?

@Dixith-Reddy-Nayeni
Copy link

Thank u very much...it worked for me..:)

@vibhuti1990
Copy link

Hi Could you please help me with the below error.

Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'

@zeanong
Copy link

zeanong commented Sep 12, 2017

Works well. Thank you!

@L-Kov
Copy link

L-Kov commented Sep 22, 2017

I get the error:
line 96, in
auth = OAuthHandler(config.consumer_key, config.consumer_secret)
AttributeError: 'module' object has no attribute 'consumer_key'

what config module do you use?

@Kanishk-Anand
Copy link

Kanishk-Anand commented Oct 31, 2017

I keep getting 401 as output. I have set up the config.py file with my credentials, still it gives 401. Any help?

@m-abubakar-saddique
Copy link

How to limit the tweets?

@salsaeede
Copy link

i keep getting the below , can someone help me to successfully import config

import config
Traceback (most recent call last):

File "C:\Users\salman\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 1, in
import config

File "C:\Users\salman\Anaconda3\lib\site-packages\config.py", line 733
except Exception, e:
^
SyntaxError: invalid syntax

@SjorsG
Copy link

SjorsG commented Jan 15, 2018

I'm so sorry bothering you after you already have written this beautiful piece of code.
I think the answer lies in the comment section in your code, but it seems like i just can't get it right.
How do i edit config.py with your configuration, then:
mkdir data
python twitter_stream_download.py -q apple -d data

I get this error:

line 96, in
auth = OAuthHandler(config.consumer_key, config.consumer_secret)
AttributeError: 'module' object has no attribute 'consumer_key'

Thank you for your time

@ericdorsey
Copy link

@SjorsG
Are you sure you have a file called "config.py" in the same folder, that has a variable in it that's called "consumer_key", that has your key assigned to it?

consumer_key = 'YOURCONSUMERKEYHERE'

@pbajpai2
Copy link

I'm using Python 3.7.0 and downloaded Tweepy 3.6.0

And after running config.py (which ends successfully) and doing the mkdir data step. I get the following error when running the twitter_stream_download.py

**C:\Users\pbajp\Git\datasci_course_materials\assignment1\alternate>python twitter_stream_download.py -q apple -d data
Traceback (most recent call last):
File "twitter_stream_download.py", line 9, in
import tweepy
File "C:\Users\pbajp\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy_init
.py", line 17, in
from tweepy.streaming import Stream, StreamListener
File "C:\Users\pbajp\AppData\Local\Programs\Python\Python37\lib\site-packages\tweepy\streaming.py", line 358
def start(self, async):
^
SyntaxError: invalid syntax**

Can anyone guide me on next steps to debug?

@agcala
Copy link

agcala commented Jul 12, 2018

@rsathishr
It is "write" not "Write"

@Germain94
Copy link

Hello everyone.
First of all, thank you for your work @bonzanini !
I'm trying to search for tweets from two weeks ago until now. Can I transform your code to do that ?

@AreRex14
Copy link

Work fine. Thank you for your work @bonzanini

@arnabghose997
Copy link

For those who are facing the following error:

Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'

You have to create a folder named "data" in the same directory, for the code to work. Hope this helps.

@Carpintonto
Copy link

maybe I am totally missing something, but it sure seems to me that the script is totally functional without import json or the @classmethod

@Benasir1
Copy link

For those who are facing the following error:

Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'

You have to create a folder named "data" in the same directory, for the code to work. Hope this helps.

@arnabghose997. I still face the same problem after creating folder 'data' in the same directory

@PranjalShekhawat
Copy link

Any idea how to resolve this error please

runfile('C:/Users/chhaj/OneDrive/Desktop/test4 tweet search.py', wdir='C:/Users/chhaj/OneDrive/Desktop')
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Error on_data: [Errno 2] No such file or directory: 'None/stream_-.json'
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/chhaj/OneDrive/Desktop/test4 tweet search.py', wdir='C:/Users/chhaj/OneDrive/Desktop')

File "C:\Users\chhaj\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\Users\chhaj\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/chhaj/OneDrive/Desktop/test4 tweet search.py", line 95, in
twitter_stream.filter(track=[args.query])

File "C:\Users\chhaj\Anaconda3\lib\site-packages\tweepy\streaming.py", line 453, in filter
self._start(is_async)

File "C:\Users\chhaj\Anaconda3\lib\site-packages\tweepy\streaming.py", line 368, in _start
self._run()

File "C:\Users\chhaj\Anaconda3\lib\site-packages\tweepy\streaming.py", line 269, in _run
self._read_loop(resp)

File "C:\Users\chhaj\Anaconda3\lib\site-packages\tweepy\streaming.py", line 331, in _read_loop
self._data(next_status_obj)

File "C:\Users\chhaj\Anaconda3\lib\site-packages\tweepy\streaming.py", line 303, in _data
if self.listener.on_data(data) is False:

File "C:/Users/chhaj/OneDrive/Desktop/test4 tweet search.py", line 50, in on_data
time.sleep(5)

KeyboardInterrupt

@valdassukevicius
Copy link

worked just fine from cmd python 3.8.5 just needed to create a data sub-folder within the assignment

@lognguyen
Copy link

@pbajpai2 i dont know if you've fixed that one yet. If you use different IDE/Interpreter when try to edit the two files, it might be the problem. In my case, i used Anaconda so i had to use the Ana Prompt to run it properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment