My Google Summer of Code 2020 Summary
This is the report of my Google Summer of Code project under Wikimedia Foundation, with my mentor Sage Ross for Wiki Education Dashboard. The past few months have been amazing, and I have learnt a whole lot of things in the way, and have had a good enough taste of how real world applications really work. I would like to thank my mentor for always being there when I needed help, and for making everything seem so simple and interesting; most all my learnings in this journey have been due to him.
Wiki Education Dashboard is a learning tool based on editing different types of wiki articles and enabling instructors to organize a range of courses and campaigns, along with grading various course materials and tracking the edits made by the students, which lead to enriching the Wikis with lots of knowledge.
Lets dive deep into my project now.
Error Tracking System
I have implemented a system for detecting and tracking errors in course update process, which is a lengthy job that runs in a few minutes frequency for each course. Wiki Education Dashboard pulls a lot of data regarding articles, revisions, etc which can take upto a few minutes time easily and can lead to API errors, unexpected format data, connection related errors, etc. It is very important to keep track of the errors so that we get to know the bottlenecks of the system. At first I tried to save all of the data in the database, but after some experimentation around some techniques, I got to know of a nicer way to implement this, which would eliminate the need of a separate error records table in the database for us, and at the same time reuse one error logging system that we already have.
I implemented this system by making an architecture which sends data to Sentry about the errors occuring while fetching data during course update jobs and assigning uuids and course slugs to each error and keeping a track of it remotely in Sentry, as well as some recent errors in the course flags data.
My work for this can be found at:
- Final PR merged into production: PR #4076
- Techniques experimented: PR #4027 PR #4039
- Medium blogs: Blog Link 1 Blog Link 2
Surfacing Errors to Users
Surfacing the course update tracked errors to the UI was another requirement that was important to keep the users informed about the errors occuring, so that they do not lead into confusion.
The data surfaced included the most recent course updates which were relevant, a summary of the updates that have ever happened in a course, whether or not a course will be updated in future and if yes then till when, special one-time updates and possible answers to some common issues.
My work for this can be found at:
- Final PRs merged into production: PR #4080 PR #4113
- Medium blogs: Blog Link 1 Blog Link 2
Orphan Lock Removal Procedure
Orphan locks are locks which are left behind by abruptly ended unique course update jobs. They are generally caused due to abrupt system shutdown, system running out of memory, etc. These orphan locks were left behind by the jobs they were representing(those jobs died due to system failure and left the lock behind). Removing these locks was a very crucial requirement for the application because they lead to completely stopping the course updates forever by giving an indication to the next upcoming update that there is a job already running by the orphan lock (even though the job is not actually running). Issues of some courses not updating were coming, and those happened due to the orphan locks which prevented further course update unique jobs to run.
I implemented a system to look for courses which could possibly have orphan locks and removed them by computing their expected digest(what the orphan lock should be equal to) after searching for those digests in the relevant queues and jobs, and ran this procedure just before a course batch update was about to start.
My work for this can be found at:
Miscellaneous
- Intergrating updation of article status and categories to course update process fully and changing the implementation somewhat in order for them to run slightly more efficiently: PR #4139 PR #4120
- Adding a new feature in activity tab to show course specific revisions as well: PR #4002 PR #4103
- Improving the UI and request fetching of activity tab: PR #4114
- Minor Issues: PR #4068 PR #4122
- Medium blog: Blog Link 1 Blog Link 2
List of All My GSoC Blogs
- My GSoC Selection & Community Bonding Experience
- The Starting: Exploring Error Tracking Techniques
- Tracking Errors by Storing only 2 Variables (via Sentry)
- GSoC First Phase Completed!
🎉 🎊 - Backgound Jobs & Orphan Locks
- GSoC Second Phase Completed!
👨💻 🚀 - Coding The Summer Away!
👨💻 🌞