Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
My Google Summer of Code 2020 Summary

My Google Summer of Code 2020 Summary

This is the report of my Google Summer of Code project under Wikimedia Foundation, with my mentor Sage Ross for Wiki Education Dashboard. The past few months have been amazing, and I have learnt a whole lot of things in the way, and have had a good enough taste of how real world applications really work. I would like to thank my mentor for always being there when I needed help, and for making everything seem so simple and interesting; most all my learnings in this journey have been due to him.

gsoc

Wiki Education Dashboard is a learning tool based on editing different types of wiki articles and enabling instructors to organize a range of courses and campaigns, along with grading various course materials and tracking the edits made by the students, which lead to enriching the Wikis with lots of knowledge.

Lets dive deep into my project now.

Error Tracking System

I have implemented a system for detecting and tracking errors in course update process, which is a lengthy job that runs in a few minutes frequency for each course. Wiki Education Dashboard pulls a lot of data regarding articles, revisions, etc which can take upto a few minutes time easily and can lead to API errors, unexpected format data, connection related errors, etc. It is very important to keep track of the errors so that we get to know the bottlenecks of the system. At first I tried to save all of the data in the database, but after some experimentation around some techniques, I got to know of a nicer way to implement this, which would eliminate the need of a separate error records table in the database for us, and at the same time reuse one error logging system that we already have.

Screenshot from 2020-07-13 01-42-43 ezgif com-optimize

I implemented this system by making an architecture which sends data to Sentry about the errors occuring while fetching data during course update jobs and assigning uuids and course slugs to each error and keeping a track of it remotely in Sentry, as well as some recent errors in the course flags data.

My work for this can be found at:

Surfacing Errors to Users

Surfacing the course update tracked errors to the UI was another requirement that was important to keep the users informed about the errors occuring, so that they do not lead into confusion.

Screenshot from 2020-08-01 00-29-31 The data surfaced included the most recent course updates which were relevant, a summary of the updates that have ever happened in a course, whether or not a course will be updated in future and if yes then till when, special one-time updates and possible answers to some common issues.

My work for this can be found at:

Orphan Lock Removal Procedure

Orphan locks are locks which are left behind by abruptly ended unique course update jobs. They are generally caused due to abrupt system shutdown, system running out of memory, etc. These orphan locks were left behind by the jobs they were representing(those jobs died due to system failure and left the lock behind). Removing these locks was a very crucial requirement for the application because they lead to completely stopping the course updates forever by giving an indication to the next upcoming update that there is a job already running by the orphan lock (even though the job is not actually running). Issues of some courses not updating were coming, and those happened due to the orphan locks which prevented further course update unique jobs to run.

Screenshot from 2020-07-14 02-15-05 Screenshot from 2020-07-16 21-42-25 I implemented a system to look for courses which could possibly have orphan locks and removed them by computing their expected digest(what the orphan lock should be equal to) after searching for those digests in the relevant queues and jobs, and ran this procedure just before a course batch update was about to start.

My work for this can be found at:

Miscellaneous

  • Intergrating updation of article status and categories to course update process fully and changing the implementation somewhat in order for them to run slightly more efficiently: PR #4139 PR #4120
  • Adding a new feature in activity tab to show course specific revisions as well: PR #4002 PR #4103
  • Improving the UI and request fetching of activity tab: PR #4114
  • Minor Issues: PR #4068 PR #4122
  • Medium blog: Blog Link 1 Blog Link 2

List of All My GSoC Blogs

Additional Links

1200px-Wiki_Education_Foundation_logo svg wikimedia gsoc-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.