Skip to content

Instantly share code, notes, and snippets.

@nic-hartley
Last active January 17, 2018 21:29
Show Gist options
  • Save nic-hartley/99cf296d22b436592fcffdde387ff6a4 to your computer and use it in GitHub Desktop.
Save nic-hartley/99cf296d22b436592fcffdde387ff6a4 to your computer and use it in GitHub Desktop.

General notes, first:

  • Each of these blocks runs repeatedly on its own thread.
    • There might be some delay (to let queues refill, or to keep from getting ratelimited by Reddit, or whatever) between each run; in theory, that shouldn't matter to the actual operation.
    • Some additional boilerplate is probably also required (checking if they're empty, etc.) but screw writing all that
  • Anything before .push or .pop is a queue
  • logging and metrics have been intentionally omitted; they can be added where necessary.
  • also, there's probably better ways to write this code, but I'm not good at Python
  • I'm only looking at Reddit interaction for this bit. Discord interaction could probably be added relatively easily, but that's too much to think about right now.
  • I'm not looking at first-time posts yet. It should just be a couple of extra lines of code, but... later.
  • I don't know exactly how the Reddit API works; for this document, I'm assuming that it gives you all of the information about a given comment, rather than just an ID that you then have to look up. However, it shouldn't be too difficult to adapt it if the latter is the case.
  • I intentionally didn't implement the class or helper functions; this is meant to be higher-level than that. When it comes down to actually writing the code, I'd be happy to :)

TR class

Basically, a class which represents a transcription request. This class allows us to query data about each transcription request without needing to go to Reddit every single time. That way, we can use API hits for only the things that are necessary, rather than using them to check if something has been claimed.

This mostly just collects a couple of posts/comments and holds them together.

  • Store information about / reference to the original post to be transcribed and the post on r/ToR about the request.
  • Record the claiming and doneing comments

Collecting posts from subreddits and normalizing

In short, we want to get everything all in one place, all in one format.

This could be split into two separate workers (one to just query subreddits, one just to normalize them into a unified format), but the tasks are related enough that it feels kinda wasteful.

for subreddit in partners:
  for post in subreddit.new_posts():
    tr = TR(post)
    if tr.valid():
      requests.push(tr)

Moving them to r/ToR

...and adding a little extra info.

tr = requests.pop()
post = r_tor.post_url('subreddit} | {type} | "{title}"'.format(tr.subreddit, tr.type, tr.title), tr.url)
post.set_flair(Flair.UNCLAIMED)
to_clear.push(tr)
tr.r_tor_url = post.url
post.comment(Comments.CLAIM_HERE)

Processing inbox contents

Process replies to ToRBot's posts and comments. Farms out the actual work of claiming, etc. to other workers.

Note that this, done-process, claim-process, unclaim-process, etc. could all be condensed into one step. I've separated them just because otherwise this one gets to be like double the length of the others, but that's not strictly necessary.

for item in unread_inbox_items:
  url = item.parent_post_url
  tr = TR.get_by_url(url)
  if not tr:
    pass # can this happen? what does it mean?
         # (because in theory, we should only be getting replies to our
         #  posts, which _should_ all be on r/ToR posts)
  if "claim" in item.text:
    claims.push([item, tr])
  elif "unclaim":
    unclaims.push([item, tr])
  elif "done" in item.text:
    dones.push([item, tr])
  else: # did I miss something the bot has to do?
    item.reply(Comments.NO_ACTION_IN_COMMENT)

Action processing

Process the various actions a user can take on a given TR. These are all, by their nature, very similar. However, they're all different enough that I can't see an easy, simple way to unify them, so I don't think it's worth it.

Process claims

claim_comment, tr = claims.pop()
if tr.claim:
  claim_comment.reply(Comments.ALREADY_CLAIMED) # aka Comments.PIXIED
elif tr.done:
  claim_comment.reply(Comments.ALREADY_DONE)
else:
  tr.claim = claim_comment
  tr.r_tor_post.flair = Flair.IN_PROGRESS
  claim_notifications.push(tr)

Process dones

done_comment, tr = dones.pop()
if !tr.claim:
  done_comment.reply(Comments.NO_DONE_WITHOUT_CLAIM)
elif tr.claim.author != done_comment.author: #or, possibly, if they're a mod
  done_comment.reply(Comments.NOT_YOUR_CLAIM)
else:
  tr.done = done_comment
  tr.r_tor_post.flair = Flair.COMPLETE
  to_clear.push(tr)
  done_comment.reply(Comments.DONED)
  increment_flair(done_comment.author)

Process unclaims

unclaim_comment, tr = unclaims.pop()
if !tr.claim:
  unclaim_comment.reply(Comments.NO_UNCLAIM_WITHOUT_CLAIM)
elif tr.claim.author != unclaim_comment.author:
  unclaim_comment.reply(Comments.NOT_YOUR_CLAIM)
else:
  tr.claim = None
  unclaim_comment.reply(Comments.UNCLAIMED)

Process late notifications

while claim_notifications:
  top = claim_notifications.pop()
  if top.claim.time < datetime.now() - timedelta(hours = 6):
    claim_notifications.unpop(top) # could also be done with peek/pop, whatever
    break
  elif !task.done:
    top.claim.reply(Comments.DID_YOU_FORGET)
  # else no-op
@nic-hartley
Copy link
Author

My pseudo-PRAW here is a little inconsistent, but I hope it gets the message across anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment