Skip to content

Instantly share code, notes, and snippets.

@anands
Last active January 30, 2017 15:59
Show Gist options
  • Save anands/d5d763a3c14c48a44bab to your computer and use it in GitHub Desktop.
Save anands/d5d763a3c14c48a44bab to your computer and use it in GitHub Desktop.

Reddit algo can be simplified as:

def hot(up, down, postingDate):
    score = up - down
    voteComponent = sign(score) * log10(max(1, abs(score)))
    timeComponent = time_since_epoch(postingDate)/12.5hrs
    return voteComponent + timeComponent

This function has a couple of important properties:

  1. The function is defined in terms of a date, not an age. This makes the function easy to cache.
  2. Interactions are weighed logarithmically. A story with 1000 upvotes is only twice as important as a story with 100 upvotes.
  3. Story age contributes linearly, so age dominates ranking in the long term. This helps to get old stories off the front page.
  4. Aging is managed by an aging factor. Reddit uses 12.5 hrs, but this factor should be tuned to balance the frequency of new content with the volume of interactions. This factor determines the weighting between age and votes. In the case of Reddit, a 1:10 vote ratio between two stories has the same effect as 12.5hrs of age.
  5. Whether a new story can enter in the top spot depends on the frequency of new content. E.g. to immediately displace a top story with 100 upvotes, a new story would have to be posted 25hrs later. This can be changed by choosing a different aging factor.
  6. If we control the content, we can choose an appropriate posting frequency to make sure that new stories enter in the top 10. Assuming that all top 10 stories uniformly get 100 votes per hour, we could post stories every 5 hrs which would enter at position #10. In practice, decay over night means we can post more frequently during the day.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment