Reddit algo can be simplified as:
def hot(up, down, postingDate):
score = up - down
voteComponent = sign(score) * log10(max(1, abs(score)))
timeComponent = time_since_epoch(postingDate)/12.5hrs
return voteComponent + timeComponent
This function has a couple of important properties:
- The function is defined in terms of a date, not an age. This makes the function easy to cache.
- Interactions are weighed logarithmically. A story with 1000 upvotes is only twice as important as a story with 100 upvotes.
- Story age contributes linearly, so age dominates ranking in the long term. This helps to get old stories off the front page.
- Aging is managed by an aging factor. Reddit uses 12.5 hrs, but this factor should be tuned to balance the frequency of new content with the volume of interactions. This factor determines the weighting between age and votes. In the case of Reddit, a 1:10 vote ratio between two stories has the same effect as 12.5hrs of age.
- Whether a new story can enter in the top spot depends on the frequency of new content. E.g. to immediately displace a top story with 100 upvotes, a new story would have to be posted 25hrs later. This can be changed by choosing a different aging factor.
- If we control the content, we can choose an appropriate posting frequency to make sure that new stories enter in the top 10. Assuming that all top 10 stories uniformly get 100 votes per hour, we could post stories every 5 hrs which would enter at position #10. In practice, decay over night means we can post more frequently during the day.