Create a gist now

Instantly share code, notes, and snippets.

@edsu /replies.py
Last active Apr 20, 2018

What would you like to do?
Try to get replies to a particular set of tweets, recursively.
#!/usr/bin/env python
"""
Twitter's API doesn't allow you to get replies to a particular tweet. Strange
but true. But you can use Twitter's Search API to search for tweets that are
directed at a particular user, and then search through the results to see if
any are replies to a given tweet. You probably are also interested in the
replies to any replies as well, so the process is recursive. The big caveat
here is that the search API only returns results for the last 7 days. So
you'll want to run this sooner rather than later.
replies.py will read a line oriented JSON file of tweets and look for replies
using the above heuristic. Any replies that are discovered will be written as
line oriented JSON to stdout:
./replies.py tweets.json > replies.json
It also writes a log to replies.log if you are curious what it is doing...which
can be handy since it will sleep for periods of time to work within the
Twitter API quotas.
PS. you'll need to:
pip install python-twitter
and then set the following environment variables for it to work:
- CONSUMER_KEY
- CONSUMER_SECRET
- ACCESS_TOKEN
- ACCESS_TOKEN_SECRET
"""
import sys
import json
import time
import logging
import twitter
import urllib.parse
from os import environ as e
t = twitter.Api(
consumer_key=e["CONSUMER_KEY"],
consumer_secret=e["CONSUMER_SECRET"],
access_token_key=e["ACCESS_TOKEN"],
access_token_secret=e["ACCESS_TOKEN_SECRET"],
sleep_on_rate_limit=True
)
def tweet_url(t):
return "https://twitter.com/%s/status/%s" % (t.user.screen_name, t.id)
def get_tweets(filename):
for line in open(filename):
yield twitter.Status.NewFromJsonDict(json.loads(line))
def get_replies(tweet):
user = tweet.user.screen_name
tweet_id = tweet.id
max_id = None
logging.info("looking for replies to: %s" % tweet_url(tweet))
while True:
q = urllib.parse.urlencode({"q": "to:%s" % user})
try:
replies = t.GetSearch(raw_query=q, since_id=tweet_id, max_id=max_id, count=100)
except twitter.error.TwitterError as e:
logging.error("caught twitter api error: %s", e)
time.sleep(60)
continue
for reply in replies:
logging.info("examining: %s" % tweet_url(reply))
if reply.in_reply_to_status_id == tweet_id:
logging.info("found reply: %s" % tweet_url(reply))
yield reply
# recursive magic to also get the replies to this reply
for reply_to_reply in get_replies(reply):
yield reply_to_reply
max_id = reply.id
if len(replies) != 100:
break
if __name__ == "__main__":
logging.basicConfig(filename="replies.log", level=logging.INFO)
tweets_file = sys.argv[1]
for tweet in get_tweets(tweets_file):
for reply in get_replies(tweet):
print(reply.AsJsonString())
@rwest202

This comment has been minimized.

Show comment Hide comment
@rwest202

rwest202 Oct 28, 2017

Thanks!

Thanks!

@Allen-Qiu

This comment has been minimized.

Show comment Hide comment
@Allen-Qiu

Allen-Qiu Nov 26, 2017

Could you give a example for tweets_file?
I wonder what should be written in the file

Could you give a example for tweets_file?
I wonder what should be written in the file

@garinthengineer

This comment has been minimized.

Show comment Hide comment
@garinthengineer

garinthengineer Feb 21, 2018

Same deal, how do you prepare the tweets file in the first place?
please provide an example.

Same deal, how do you prepare the tweets file in the first place?
please provide an example.

@JuanSierra

This comment has been minimized.

Show comment Hide comment
@JuanSierra

JuanSierra Mar 5, 2018

This line in tweets.json worked for me:

{"user":{"screen_name": "HumanoidHistory"},"id": 970447777053462528}

This line in tweets.json worked for me:

{"user":{"screen_name": "HumanoidHistory"},"id": 970447777053462528}

@Robertoez

This comment has been minimized.

Show comment Hide comment
@Robertoez

Robertoez Mar 7, 2018

How would this be done in reverse? - as in, you have a certain reply & want to find the ID of the original tweet it was in reply to

How would this be done in reverse? - as in, you have a certain reply & want to find the ID of the original tweet it was in reply to

@lakshadvani

This comment has been minimized.

Show comment Hide comment
@lakshadvani

lakshadvani Mar 8, 2018

I keep getting a key error when i put my consumer_key any workarounds?

I keep getting a key error when i put my consumer_key any workarounds?

@chandanprsharma

This comment has been minimized.

Show comment Hide comment
@chandanprsharma

chandanprsharma Mar 10, 2018

I am using Twitter4J implementation... Can you tell me how can we get tweets replies there ?

I am using Twitter4J implementation... Can you tell me how can we get tweets replies there ?

@zakigatez

This comment has been minimized.

Show comment Hide comment
@zakigatez

zakigatez Mar 28, 2018

it doesn't work ???

it doesn't work ???

@mvarnold

This comment has been minimized.

Show comment Hide comment
@mvarnold

mvarnold Apr 20, 2018

@lakshadvani If you define the environ variables before setting them as defaults in line 45, it works for me. Looks like this

from os import environ as e

e["consumer_key"]="Q3BuOCb6YpCcjxRLX80si7iaY"
e['consumer_secret']="8LmPPMK1j40QKn3jKzNP7MSGQXL90ZXfbHLtRbRR6d1mZKVQwe"
e["access_token_key"]="462603440-1WpphlO6HmRcVDZJjglIosJekSOmLLSQ2rkAC3b9"
e['access_token_secret']="4A2vZ2ioqwHJANDAHZTPFoKp9scbN9zTAnhlXbkLboZdZ"

t = twitter.Api(
    consumer_key=e["consumer_key"],
    consumer_secret=e["consumer_secret"],
    access_token_key=e["access_token_key"],
    access_token_secret=e["access_token_secret"],
    sleep_on_rate_limit=True
)

@lakshadvani If you define the environ variables before setting them as defaults in line 45, it works for me. Looks like this

from os import environ as e

e["consumer_key"]="Q3BuOCb6YpCcjxRLX80si7iaY"
e['consumer_secret']="8LmPPMK1j40QKn3jKzNP7MSGQXL90ZXfbHLtRbRR6d1mZKVQwe"
e["access_token_key"]="462603440-1WpphlO6HmRcVDZJjglIosJekSOmLLSQ2rkAC3b9"
e['access_token_secret']="4A2vZ2ioqwHJANDAHZTPFoKp9scbN9zTAnhlXbkLboZdZ"

t = twitter.Api(
    consumer_key=e["consumer_key"],
    consumer_secret=e["consumer_secret"],
    access_token_key=e["access_token_key"],
    access_token_secret=e["access_token_secret"],
    sleep_on_rate_limit=True
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment