Skip to content

Instantly share code, notes, and snippets.

@luptilu
Created April 24, 2018 15:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save luptilu/983253d30e9891d2fb215fd465a3ee27 to your computer and use it in GitHub Desktop.
Save luptilu/983253d30e9891d2fb215fd465a3ee27 to your computer and use it in GitHub Desktop.
a python twitter scraper for the syria conference 2018
import twitter, json, sys, csv #importing different modules needed for the execution of this code
# == OAuth Authentication ==
consumer_key="ADD YOUR OWN" #my consumer key
consumer_secret="ADD YOUR OWN" #my consumer secret
# Create an access token under the the "Your access token" section
access_token="ADD YOUR OWN"#my access token key
access_token_secret="ADD YOUR OWN" #my access token secret
auth = twitter.oauth.OAuth(access_token, access_token_secret, consumer_key, consumer_secret) #defining the variable auth. this is used later on down below. it accesses the twitter module and grasps the oath variable.
twitter_api = twitter.Twitter(auth=auth) #defining the variable twitter_api and calling the Twitter function
csvfile = open('syriaconf2018.csv', 'w') #opens a csv file named wad17.csv with the mode writing. w stands for writing and creates a file with only writing permissions. an existing file with the same name will be erased.
csvwriter = csv.writer(csvfile,delimiter ='|') #calls the writer function of the csv module. it will create delimited strings of data, delimited by |
q = "syriaconf2018" #defining the variable q. this is what we search twitter for. used in code down below.
# clean up our data so we can write unicode to CSV
def clean(val): #function "clean" with the input "val"
clean = "" #"clean" is empty to start with. later, "clean" is returned.
if val:
val = val.replace('|', ' ') #replaces "|" with a space
val = val.replace('\n', ' ') #replaces new lines with a space
val = val.replace('\r', ' ') #replaces carriage returns with a space
clean = val.encode('utf-8') #encodes the data in utf-8
return clean #returns the clean value of "clean"
print 'Filtering the public timeline for keyword="%s"' % (q) #prints out "Filtering the public timeline for keyword=WAD2017". %s is replaced by the value follwing % and q was earlier defined as #WAD2017.
twitter_stream = twitter.TwitterStream(auth=twitter_api.auth) #defining the twitter streaming API
stream = twitter_stream.statuses.filter(track=q) #telling the Twitter streaming API to track the word "WAD2017".
for tweet in stream: #iterates through the tweets in the Twitter streaming API with the word "WAD2017"
# print json.dumps(tweet)
try: #the try statement is used to handle exceptions. first, the try clause is executed. If no exception occurs, the except clause is skipped and execution of the try statement is finished.
if tweet['truncated']: #if the tweet is truncated,
tweet_text = tweet['extended_tweet']['full_text'] #it will be replaced by the extended tweet text
else: #if the tweet is not truncated,
tweet_text = tweet['text'] #the tweet text object
csvwriter.writerow([tweet['created_at'], #the csvwriter function is used to write rows in a csv file of the tweet creation date,
clean(tweet['user']['screen_name']), #the user's screen name,
clean(tweet_text), #the tweet itself,
tweet['user']['created_at'], #the user creation date,
tweet['user']['followers_count'], #the follower count,
tweet['user']['friends_count'], #the count of people the user follows,
tweet['user']['statuses_count'], #the amount of statuses the user has written,
clean(tweet['source']), #the utility used to post the Tweet
clean(tweet['user']['location']), #the profile location of the user
tweet['user']['geo_enabled'], #the boolean value of if the user has enabled geolocation or not
tweet['user']['lang'], #the user's language
clean(tweet['user']['time_zone']) #the user's time zone
])
print tweet_text #prints the tweet to the terminal
except Exception, err: #If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement. The exception class "Exception" contains all built-in, non-system-exiting exceptions. The "err" following Exception is a User-defined Exception and calls the specific exception.
print err #the specific exception is printed
pass #"pass" is used when a statement is required syntactically but you do not want any command or code to execute.
print "done" #when finished iterating, code prints "done" to terminal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment