There are:
- 100M users/authors
- 20M papers
- 2M topics
The typical user has:
- 1k followers
- 2k followees
- 10 topics that they follow
- 30 papers uploaded
The typical topic has:
- 1M followers
- 1M papers
The typical paper has:
- 5 topics tagged
- 1 author
—
- Structure tables in database
- Structure news feed page for user
—
| 1 | Structure tables in database
— ———————————————————
——— | Users / Authors | ——— { userId, followers: [ userId ], following: [ userId ], topics: [ topicId ], uploadedPapers: [ paperId ] }
— — — | Papers — — — { paperId, authorId, topicsFollowed: [ topicId ], uploadTimestamp }
— — — | Topics —— — { topicId, papers: [ paperId ], followers: [ userId ] }
—
| 2 | Structure news feed page for user
— ———————————————————
| Current User
| Papers
- filter by topic
- filter by users followed by current user
- sorted by most recent
- limit 20
OPTIMIZATION
- Create indices on userId, paperId, and topicId
- create composite indices (topicId, paperId)
QUERIES
Papers of followed authors - SELECT uploadedPapers FROM user WHERE userId IN following Papers in followed topics - SELECT papers FROM topics WHERE topicId in topicsFollowed
Combine previous two queries: SELECT paperId FROM Paper WHERE authorId IN (SELECT followeeId FROM UserFollowees WHERE userId = ?) UNION SELECT paperId FROM Paper WHERE paperId IN (SELECT paperId FROM TopicPapers WHERE topicId IN (SELECT topicId FROM UserTopics WHERE userId = ?)) ORDER BY uploadTimestamp DESC LIMIT 20
UPLOAD A Paper
- Notify followers
- Notify followers for each topic
Use a set
to avoid duplicates