Instantly share code, notes, and snippets.

@edsu /replies.py
Last active Feb 13, 2019

Embed
What would you like to do?
Try to get replies to a particular set of tweets, recursively.
#!/usr/bin/env python
"""
Twitter's API doesn't allow you to get replies to a particular tweet. Strange
but true. But you can use Twitter's Search API to search for tweets that are
directed at a particular user, and then search through the results to see if
any are replies to a given tweet. You probably are also interested in the
replies to any replies as well, so the process is recursive. The big caveat
here is that the search API only returns results for the last 7 days. So
you'll want to run this sooner rather than later.
replies.py will read a line oriented JSON file of tweets and look for replies
using the above heuristic. Any replies that are discovered will be written as
line oriented JSON to stdout:
./replies.py tweets.json > replies.json
It also writes a log to replies.log if you are curious what it is doing...which
can be handy since it will sleep for periods of time to work within the
Twitter API quotas.
PS. you'll need to:
pip install python-twitter
and then set the following environment variables for it to work:
- CONSUMER_KEY
- CONSUMER_SECRET
- ACCESS_TOKEN
- ACCESS_TOKEN_SECRET
"""
import sys
import json
import time
import logging
import twitter
import urllib.parse
from os import environ as e
t = twitter.Api(
consumer_key=e["CONSUMER_KEY"],
consumer_secret=e["CONSUMER_SECRET"],
access_token_key=e["ACCESS_TOKEN"],
access_token_secret=e["ACCESS_TOKEN_SECRET"],
sleep_on_rate_limit=True
)
def tweet_url(t):
return "https://twitter.com/%s/status/%s" % (t.user.screen_name, t.id)
def get_tweets(filename):
for line in open(filename):
yield twitter.Status.NewFromJsonDict(json.loads(line))
def get_replies(tweet):
user = tweet.user.screen_name
tweet_id = tweet.id
max_id = None
logging.info("looking for replies to: %s" % tweet_url(tweet))
while True:
q = urllib.parse.urlencode({"q": "to:%s" % user})
try:
replies = t.GetSearch(raw_query=q, since_id=tweet_id, max_id=max_id, count=100)
except twitter.error.TwitterError as e:
logging.error("caught twitter api error: %s", e)
time.sleep(60)
continue
for reply in replies:
logging.info("examining: %s" % tweet_url(reply))
if reply.in_reply_to_status_id == tweet_id:
logging.info("found reply: %s" % tweet_url(reply))
yield reply
# recursive magic to also get the replies to this reply
for reply_to_reply in get_replies(reply):
yield reply_to_reply
max_id = reply.id
if len(replies) != 100:
break
if __name__ == "__main__":
logging.basicConfig(filename="replies.log", level=logging.INFO)
tweets_file = sys.argv[1]
for tweet in get_tweets(tweets_file):
for reply in get_replies(tweet):
print(reply.AsJsonString())
@rwest202

This comment has been minimized.

Copy link

rwest202 commented Oct 28, 2017

Thanks!

@Allen-Qiu

This comment has been minimized.

Copy link

Allen-Qiu commented Nov 26, 2017

Could you give a example for tweets_file?
I wonder what should be written in the file

@garinthengineer

This comment has been minimized.

Copy link

garinthengineer commented Feb 21, 2018

Same deal, how do you prepare the tweets file in the first place?
please provide an example.

@JuanSierra

This comment has been minimized.

Copy link

JuanSierra commented Mar 5, 2018

This line in tweets.json worked for me:

{"user":{"screen_name": "HumanoidHistory"},"id": 970447777053462528}

@Robertoez

This comment has been minimized.

Copy link

Robertoez commented Mar 7, 2018

How would this be done in reverse? - as in, you have a certain reply & want to find the ID of the original tweet it was in reply to

@lakshadvani

This comment has been minimized.

Copy link

lakshadvani commented Mar 8, 2018

I keep getting a key error when i put my consumer_key any workarounds?

@chandanprsharma

This comment has been minimized.

Copy link

chandanprsharma commented Mar 10, 2018

I am using Twitter4J implementation... Can you tell me how can we get tweets replies there ?

@zakigatez

This comment has been minimized.

Copy link

zakigatez commented Mar 28, 2018

it doesn't work ???

@mvarnold

This comment has been minimized.

Copy link

mvarnold commented Apr 20, 2018

@lakshadvani If you define the environ variables before setting them as defaults in line 45, it works for me. Looks like this

from os import environ as e

e["consumer_key"]="Q3BuOCb6YpCcjxRLX80si7iaY"
e['consumer_secret']="8LmPPMK1j40QKn3jKzNP7MSGQXL90ZXfbHLtRbRR6d1mZKVQwe"
e["access_token_key"]="462603440-1WpphlO6HmRcVDZJjglIosJekSOmLLSQ2rkAC3b9"
e['access_token_secret']="4A2vZ2ioqwHJANDAHZTPFoKp9scbN9zTAnhlXbkLboZdZ"

t = twitter.Api(
    consumer_key=e["consumer_key"],
    consumer_secret=e["consumer_secret"],
    access_token_key=e["access_token_key"],
    access_token_secret=e["access_token_secret"],
    sleep_on_rate_limit=True
)
@linuxandchill

This comment has been minimized.

Copy link

linuxandchill commented May 7, 2018

hi!
what kind of argument are u passing to the def get_replies(tweet) function?
Is it supposed to be a string or a tweet Id or what?

@mrmrn

This comment has been minimized.

Copy link

mrmrn commented Jun 18, 2018

How can I get a users replies to another tweets and get the original tweet?
I mean let assume user A replied to user B and user C.
I want to retrieve all replies of A. So I want user B as original tweet and user A tweet as reply. And user C tweet as original tweet and user A reply to the original tweet. I want to have all replies of user A with original tweets.
Thanks

@saverymax

This comment has been minimized.

Copy link

saverymax commented Jul 20, 2018

thanks for this, was going to start from scratch, but I always appreciate a template!

@Michelle170

This comment has been minimized.

Copy link

Michelle170 commented Jul 23, 2018

Thanks for this. While it seems the replies.log will provide many replies not on this particular tweet ID but for this user's other tweets.

@leeyamkeng

This comment has been minimized.

Copy link

leeyamkeng commented Jul 23, 2018

Thank you for your code. but I can't get 100 replies per tweet with the following command, although the count is set to 100.

replies = t.GetSearch(raw_query=q, since_id=tweet_id, max_id=max_id, count=100)

I can only get 15 recent replies of the candidate.. Is there any way for me to scrap 100 replies by tweet id?

@santiag080

This comment has been minimized.

Copy link

santiag080 commented Aug 21, 2018

The replies.log came out empty :/

@arnolem

This comment has been minimized.

Copy link

arnolem commented Oct 7, 2018

The standard search API searches against a sampling of recent Tweets published in the past 7 days.

@yoshi9696

This comment has been minimized.

Copy link

yoshi9696 commented Oct 18, 2018

I don't know what is "tweets_file"...
and I want file "replies.log"

@MerleLiuKun

This comment has been minimized.

Copy link

MerleLiuKun commented Nov 23, 2018

I use the method, but the result for comments count dose not like the twitter.com show. And the count is lack 1. Is the max_id or since_id cause the error?

@serdec

This comment has been minimized.

Copy link

serdec commented Dec 17, 2018

Problem is, when you specify the raw_query parameter, the GetSearch function discards all the other parameters as specified here https://python-twitter.readthedocs.io/en/latest/_modules/twitter/api.html#Api.GetSearch
You need to specify all the parameters inside the raw query q, something like (hopefully better):
q = "q=to%3A" + user + "&since_id=" + str(tweet_id) + "&max_id=" + str(max_id) + "&count=100"

@Emekaborisama

This comment has been minimized.

Copy link

Emekaborisama commented Jan 10, 2019

pls i would love to know if i will fill in the tweet link url in the def tweet url(t) function

@MichaelCurrin

This comment has been minimized.

Copy link

MichaelCurrin commented Jan 12, 2019

I can confirm the point made by @serdec - using raw meant the other fields like max_id were ignored so I was stuck on the first page.

I took out 'raw' key and replaced with term key and value. This works great.

    term = "to:%s" % user
    replies = t.GetSearch(
                term=term,
                since_id=tweet_id,
                max_id=max_id,
                count=100,
            )
@MichaelCurrin

This comment has been minimized.

Copy link

MichaelCurrin commented Jan 12, 2019

There's a problem on breaking out the while loop - it happens to soon and will miss the last page of results which will probably have less than 100 tweets.

Also bear in mind that API's max ID filter is inclusive, which means that the last tweet of page N will be at the start of page N+1, which means you double count and it's hard to know when you have the last page.

So my implementations gets one less than the last ID as the max ID, so that reply will be excluded from the next page. And then I check for zero tweets on a page and then break from the while loop.

    ...
    page_index = 0
    while True:
        page_index += 1
        print(f"Page: {page_index}")

        try:
            replies = ...
        except twitter.error.TwitterError as e:
            ...

        if not replies:
            break        # <<<

        for reply in replies:
            ...

        max_id = reply.id - 1     # <<<

My suggestion is also that the recursive reply magic can be commented out if it's not needed. And to save getting rate limited too easily from frequent requests.

@PAVITHRA-CP

This comment has been minimized.

Copy link

PAVITHRA-CP commented Jan 19, 2019

I have a problem that, I have a file which looks like this:

['972651', '80080680482123777', '0.0']->['189397006', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['10678072', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['14569462', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['41634505', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['81232966', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['21282483', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['35165557', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['12735762', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['39076620', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['36841912', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['174692880', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['63007952', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['23500923', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['14287455', '80080680482123777', '1.8']
['972651', '80080680482123777', '0.0']->['166323176', '80080680482123777', '2.17']
['972651', '80080680482123777', '0.0']->['19543802', '80080680482123777', '2.68']
['972651', '80080680482123777', '0.0']->['25246700', '80080680482123777', '2.7']
['972651', '80080680482123777', '0.0']->['286219571', '80080680482123777', '2.85']
['972651', '80080680482123777', '0.0']->['22028700', '80080680482123777', '2.98']

First value represent user id and second value tweet id then after "->" symbol, first value represent response user id corresponding to same tweet id.

I want to retrieve corresponding responses of the source tweet from particular users.

Can anyone help me.

Thanks in advance..!!

@MerleLiuKun

This comment has been minimized.

Copy link

MerleLiuKun commented Jan 28, 2019

@PAVITHRA-CP Looks like you want to get users conversation belong to the pointed tweet. You can search the target tweet. or search the user you want to get(use the endpoint search/tweets). just set the since_id to tweet id. Maybe you can get what you want.

@fredwilliam

This comment has been minimized.

Copy link

fredwilliam commented Feb 13, 2019

Hi i have a small task based on this, i am paying for this assistance. Please Reach me at fred.haule@gmail.com. Thanks for sharing Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment