Last active
December 17, 2015 11:39
-
-
Save erikrose/5603537 to your computer and use it in GitHub Desktop.
Where Happens When Firefox Crashes? A talk at The Fifth Elephant, Bangaluru, India, July 2013
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- A sense of scale | |
- A Firefox crash is 150K. | |
- 50 crashes/second. 3M/day. | |
- 110 TB in HDFS | |
- 500 GB in PostgreSQL | |
- 120 boxes. Real hardware, not VMs. | |
- Challenges | |
- Dictum: “Never lose a crash.” | |
- Upside-down user story: few users, much data | |
- Hither and thither | |
- Spec always changes, so configman—our in-house component | |
arch—ties it all together. | |
- Sysadmins and we can use whatever config mechanism we want | |
- Env vars | |
- INI | |
- JSON | |
- A taste of the front end | |
- Pretty graphs, nice reports. | |
- Inside the browser | |
- The stack-walking, non-memory-moving magic of breakpad | |
- What we collect | |
- The journey through the back end | |
- POST to collectors | |
- Dice rolls for a representative sample | |
- Rules for homing in on interesting events | |
- FS as buffer | |
- Crashmovers | |
- RabbitMQ | |
- Soft realtime: priority and normal queues | |
- HBase | |
- Historically a point of unreliability | |
- Downtime for upgrades—but not for long | |
- Processors | |
- Mini-dump stackwalk restores debug symbols. | |
- Postgres | |
- Stored procs | |
- Reporting | |
- Previously a half-baked queue | |
- Elasticsearch | |
- Faceting | |
- Cron jobs | |
- Calculate aggregates | |
- Top crashes by signature | |
- Crashes/100ADU/build | |
- Detect duplicates | |
- Match crashes against Bugzilla bugs | |
- Process new build from FTP server | |
- Web services (“middleware”) | |
- CrashStats | |
- Systems engineering | |
- A tale of lying NICs, TCP stack bugs, and regular HBase | |
failures | |
- Processor bug and our consequent reindexing | |
- The front end in more depth | |
- Python/Django. Previously, PHP. | |
- The crazy test-driven way we did the bug-for-bug port | |
- Use cases | |
- Most common crashes for a version, platform, etc. | |
- New crashes | |
- Correlations | |
- Sending advice back to users | |
- Deployment and release management | |
- Future challenges | |
- Continuous deployment | |
- We expect a 40% increase in crashes this year due to shipping 1M | |
Firefox OS phones. | |
- JavaScript crash catching |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment