Skip to content

Instantly share code, notes, and snippets.

@erikrose
Last active December 17, 2015 11:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save erikrose/5603537 to your computer and use it in GitHub Desktop.
Save erikrose/5603537 to your computer and use it in GitHub Desktop.
Where Happens When Firefox Crashes? A talk at The Fifth Elephant, Bangaluru, India, July 2013
- A sense of scale
- A Firefox crash is 150K.
- 50 crashes/second. 3M/day.
- 110 TB in HDFS
- 500 GB in PostgreSQL
- 120 boxes. Real hardware, not VMs.
- Challenges
- Dictum: “Never lose a crash.”
- Upside-down user story: few users, much data
- Hither and thither
- Spec always changes, so configman—our in-house component
arch—ties it all together.
- Sysadmins and we can use whatever config mechanism we want
- Env vars
- INI
- JSON
- A taste of the front end
- Pretty graphs, nice reports.
- Inside the browser
- The stack-walking, non-memory-moving magic of breakpad
- What we collect
- The journey through the back end
- POST to collectors
- Dice rolls for a representative sample
- Rules for homing in on interesting events
- FS as buffer
- Crashmovers
- RabbitMQ
- Soft realtime: priority and normal queues
- HBase
- Historically a point of unreliability
- Downtime for upgrades—but not for long
- Processors
- Mini-dump stackwalk restores debug symbols.
- Postgres
- Stored procs
- Reporting
- Previously a half-baked queue
- Elasticsearch
- Faceting
- Cron jobs
- Calculate aggregates
- Top crashes by signature
- Crashes/100ADU/build
- Detect duplicates
- Match crashes against Bugzilla bugs
- Process new build from FTP server
- Web services (“middleware”)
- CrashStats
- Systems engineering
- A tale of lying NICs, TCP stack bugs, and regular HBase
failures
- Processor bug and our consequent reindexing
- The front end in more depth
- Python/Django. Previously, PHP.
- The crazy test-driven way we did the bug-for-bug port
- Use cases
- Most common crashes for a version, platform, etc.
- New crashes
- Correlations
- Sending advice back to users
- Deployment and release management
- Future challenges
- Continuous deployment
- We expect a 40% increase in crashes this year due to shipping 1M
Firefox OS phones.
- JavaScript crash catching
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment