Skip to content

Instantly share code, notes, and snippets.

@dylanvee
Created February 14, 2013 19:44
Show Gist options
  • Save dylanvee/4955728 to your computer and use it in GitHub Desktop.
Save dylanvee/4955728 to your computer and use it in GitHub Desktop.
Sometimes you learn from Codecademy; sometimes you learn from real-life experience. Yesterday, Codecademy was offline for almost 24 hours. We want to apologize and explain what happened.
What happened?
At 1:09AM EST on Wednesday 2/13, we took Codecademy down to deal with a database migration issue we were facing. We tweeted about it at 1:30. Then we spent the next 24 hours behind the scenes to iron out the issue.
How did it happen?
Like most web applications, Codecademy stores its information — everything from your submissions in exercises to new accounts — in a database. We use a technology called MongoDB (by our friends at 10gen) to do this. Our databases have hundreds of millions of items in them and are growing larger by the second. We've been working to change the configuration of our databases so that we can migrate our data to new database structures, laying a solid foundation for future developments and features.
Around 1:30 PM yesterday we became aware of an issue: one of our local environments was not set up correctly, and was causing a database malfunction. The whole team dropped what they were working on and focused on restoring the site as quickly as possible.
Soon after we made the decision to keep the site down until we were 100% sure we could restore the majority of the data. Working with our external service providers, we matched up the data they had with the data we had internally from backups. We verified the integrity of certain data and began piecing back together our databases before bringing the site back online.
After pulling together backups of submissions from Amazon S3, course progress from our backup, and emails from our email database and userfox, we tested things internally to verify we hadn't lost much. Then, finally, we brought the site back up.
What does this mean for me?
We were able to minimize the number of people affected by pulling from our backups, but we were not able to restore everything.
If your progress is missing, please wait — we will be gradually restoring all course progress over the next few days. You can continue where you left off on your course.
Certain users have data types that we were not able to retrieve.
Anyone who created a new account from 2/6 - 2/13
Profile pictures will be missing and connection to social media accounts will need to be restored. We will be sending you a link to get set up again.
Anyone who created or joined a new group from 2/6 - 2/13
A small number of users and beta testers have access to our groups feature. If you created a group in the past week you will need to recreate it. If you joined a group in the past week you will need to rejoin.
What Now?
This event underscored the importance of frequent backups. We have created a new plan based on what we have learned to better recover from potential database migration issues in the future.
We will also be extending everyone's streaks today. We know that you are committed to learning to code. We are committed to helping you get there. If our site goes down, your learning does too. This is the least we can do.
I wrote this post so that you would have confidence that we take downtime very seriously; we learn from each and every event; and we are committed to building the strongest community for learning how to program. Thank you to all our users for your patience.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment