Skip to content

Instantly share code, notes, and snippets.

@HH0718
Last active November 18, 2022 19:36
Show Gist options
  • Save HH0718/22673a8fb065ad8797ff2fc71f3cce2c to your computer and use it in GitHub Desktop.
Save HH0718/22673a8fb065ad8797ff2fc71f3cce2c to your computer and use it in GitHub Desktop.
A reddit PushShift API call to get total user comments.
import requests
def get_author_total_comments(**kwargs):
r = requests.get("https://api.pushshift.io/reddit/comment/search/", params=kwargs)
data = r.json()
return data['metadata']['total_results']
if __name__ == "__main__":
total_results = get_author_total_comments(author="UnemployedTechie2021", size=0, metadata=True)
@HH0718
Copy link
Author

HH0718 commented Nov 18, 2022

Thanks.

I saw your other post and here is an effort at me "refactoring" it and providing some feedback.

import requests
from pprint import pprint

DEBUG = False  # Set `True` if you want to print the get request response.


def get_reddit_user_comments(**kwargs) -> dict:
    """
    Visit https://github.com/pushshift/api for available parameters
    """
    URL = 'https://api.pushshift.io/reddit/comment/search'  # Base PushShift URL for comment search
    data = requests.get(URL, params={**kwargs})  # Get request at URL and append the search parameters passed.
    if DEBUG:
        pprint(data.json())
    return data.json()  # Returns the data in dictionary format


# `if __name__ == "__main__":` ensures that if this module is called from another module, the code below it wont run. So to run the code below you run it directly using `python3 thisfilename.py`
if __name__ == "__main__":
    # after: returns comments after 7 hours ago | metadata: returns metadata info about the request | limit returns 100 items (default is 25 and max is 1000 or 2000) | fields: returns only the fields specified with a comma delimiter
    params = {"after": "7h", "metadata": True, "limit": 100,
              "fields": "score"}  # Set search parameters for endpoint as a dictionary
    username = input("Enter your Reddit username: ")

    # If a username was provided then add it to the params dictionary, if not add default username to the params dictionary then print a message
    params.update({"author": username}) if username != "" else params.update({"author": "UnemployedTechie2021"}), \
    print("WARNING: No username entered. Using default Username.")

    #  call `get_reddit_user_comments() and unpack the params dictionary -> get_reddit_user_comments(author="UnemployedTechie2021", after="7h", metadata=True, limit=100, fields="score")
    results = get_reddit_user_comments(**params)
    print(f"{params.get('author')}'s total score: {sum(score['score'] for score in results['data'])}")

Notice I took away pandas for the time being as it is a large package.

Simply put, pandas was an expensive package to have when you can get the same results without it.

I'm sure you have use for it in the future, but implement one function/feature at a time as simple as possible. Then use pandas or whatever other packages you need to perform the work you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment