Skip to content

Instantly share code, notes, and snippets.

@nastra
Last active September 13, 2017 19:58
Show Gist options
  • Save nastra/277ed4d7d200fa152ecb1e9fc80cc98e to your computer and use it in GitHub Desktop.
Save nastra/277ed4d7d200fa152ecb1e9fc80cc98e to your computer and use it in GitHub Desktop.
@scale Conference 2017 - San Jose - Summary of attended Talks

DEV TOOLS & OPS: The journey of turning a CI system into a universal platform (Facebook)

Blogpost: https://code.facebook.com/posts/222017798302928/jupiter-a-high-performance-job-matching-service/

Video: https://www.facebook.com/atscaleevents/videos/vl.1672297376137659/1960309840908777/?type=1

Sandcastle CI platform (will be open sourced this year)

  • different APIS for VCS / Remote Control / ...
  • manages installation of needed dependencies (Android stuff, Libs, ...)
  • maintains machine health
  • all teams at FB use Sandcastle CI
  • Sandcastle manages fleet of machines
  • machines use CoW filesystem (Btrfs) for recreating Code Repo on local machines (since tests can modify or even delete the repo they work on) Sandcastle CI uses Jupiter (high-performance job-matching service) for scalability
  • every job runs in an isolated environment (Worker)
  • written in C++ and accessible via Thrift
  • Workers declare what they can do or what they have (RAM, Kernel Version, Source code repositories, models of smartphones the worker is connected to)
  • Similarly, each Job encodes the capabilities it requires

DATA: LogDevice: A file-structured log system (Facebook)

Blogpost: https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/

Video: https://www.facebook.com/atscaleevents/videos/vl.1446382685438491/1960287690910992/?type=1

designed for a variety of logging workloads at Facebook. Will be open sourced this year. Examples are:

  • Scribe, which provides a fire-and-forget write API with delivery latency expectations on the order of seconds
  • TAO (distributed data store for the Social Graph), where FB keeps secondary indexes for their Graph. Strict record ordering per log is important, and the expected end-to-end latency is on the order of 10ms
  • Machine Learning Pipelines, which use LogDevice for delivering identical event streams to multiple ML model training services

designed specifically for log files, which are record-oriented, append-only, trimmable

  • record-oriented: data is written to log as indivisible records, rather than individual bytes
  • append-only: since logs are naturally append-only, there's no support for modifying existing records
  • trimmable: logs usually live for a long time (days/months/years) before they are deleted. Primary space reclamation mechanism is trimming (dropping oldest records based on a time/size-retention policy)

LogDevice is tunable for different performance / availability / latency requirements uses non-deterministic placement of records to improve write availability and better tolerate temporary spikes

  • For each log in a LogDevice cluster there's a sequencer object whose sole job is to issue monotonically increasing sequence numbers as records are appended to that log.
  • sequencer may run wherever it is convenient: on a storage node, or on a node reserved for sequencing and append execution that does no storage.
  • once a record is stamped with a sequence number, copies (typically 3) may potentially be placed on any node in the cluster
  • A client that wishes to read a particular log contacts all storage nodes that are permitted to store records of that log
  • LogDevice client lib performs reordering and de-duplication of records
  • such a placement and delivery scheme is beneficial for write availability and dealing with spikes during writes, but not when performing many point reads
  • however, it seems that most of the time clients read records sequentially and this seems to work out
  • The reasoning as to why a client would contact all nodes during a read is that it seems very likely that every contacted node might have some records to deliver

LogDevice uses ZooKeeper internally as the epoch store, which is used for sequence numbers

Local log storage is based on LogsDB (which in turn is based on RocksDB)

  • write-optimized data store designed to keep the number of disk seeks small and controlled
  • write and read IO patterns on the storage device mostly sequential

HOT TOPICS: Archer, a distributed computing platform for media processing (Netflix)

Video: https://www.facebook.com/atscaleevents/videos/1960331847573243/ no concrete notes, but will be open sourced this year

DEV TOOLS & OPS: Keeping 2 billion lines of code moving forward (Google)

Related Paper: https://research.google.com/pubs/pub45424.html

Video: https://www.facebook.com/atscaleevents/videos/1960315840908177/

  • uses a homegrown version-control system (called Piper) to host one large codebase

  • used by 95% of software developers worldwide

  • some numbers: ** includes one billion files ** 2 billion lines of code ** 9 million source files ** history of approximately 35 million commits ** 86TB of data ** 40000 commits per day (16K by developers / 24K by automated systems)

  • repository serves daily: 500K - 800K queries per second (most from distributed build-and-test systems)

  • background info to Piper: ** stores single large repo and is implemented on top of Spanner (originally Bigtable) ** distributed across 10 Google data centers around the world, relying on Paxos algorithm for consistency guarantees across replicas ** most devs access Piper through a system called Clients in the Cloud (CitC) (consists of a cloud-based storage backend and a Linux-only FUSE file system) ** CitC supports code browsing and normal Unix tools with no need to clone or sync state locally ** CitC workspaces typically consume only a small amount of storage while presenting a seamless view of the entire Piper codebase to the developer ** essentially, CitC stores all in-progress work in the cloud, therefore making it accessible to everyone (humans / code tools)

  • Trunk-based development ** Google performs trunk-based Development (development on branches is unusual and not well supported) ** branches are mostly used for releases. A release is typically a snapshot of head ** when new features are developed, both new and old code paths commonly exist simultaneously, controlled through the use of conditional flags -> no dev branches needed and features can easily be turned on/off

Engineering Productivity Research

  • large scale log analysis to determine what each Dev is doing all day ** e.g. Clicking in code review / Clicking in code search / back to code review / new code review / ... ** graphs show also context switching throughout the day ** main goal: identify unproductive time usage and inefficient patterns
  • this is only possible because of having one single source code repository

DEV TOOLS & OPS: Rapid release at massive scale (Facebook)

Blogpost: https://code.facebook.com/posts/270314900139291/rapid-release-at-massive-scale/ Video: https://www.facebook.com/atscaleevents/videos/1960318917574536/

Previous Release process (for almost 10 years):

  • using master/release branch strategy
  • pushing 3x a day to production
  • engineers would request cherry-picks that should go from master into release
  • 500-700 cherry-picks per days
  • new release branch was cut 1x a week, picking up everything that wasn't cherry-picked yet
  • scaled well for many years
  • main problem: amount of manual effort needed to coordinate/deliver large releases every week wasn't sustainable anymore

New Release process:

  • quasi-continuous push from master
  • new process/system can push tens of hundreds of diffs every few hours
  • each release is rolled out to 100% in a tiered fashion (tier1: Employees / tier2: 2% production / tier3: 100% production) over a few hours
  • release can be stopped at any tier if issues are found
  • Tier1: diffs that have passed a series of automated internal tests and land in master are pushed out to Facebook employees. Push-blocking alerts are issued if regressions are being introduced
  • Tier2: changes are pushed to 2% of production. Signal and monitor alerts are collected to detect potential regressions
  • Tier3: changes are pushed to 100% of production. Flytrap tool aggregates user reports and alerts if anomalies are found
  • Many changes use a Gatekeeper system, which allows to roll out mobile and web code releases independently from new features. If a problem is found, gatekeeper is switched off rather than reverting back to a previous version or fix forward.

Advantages of new Release process:

  • eliminates the need for hotfixes
  • allows better support for a global engineering team
  • provides a forcing function to develop the next generation of tools, automation, and processes necessary to allow the company to scale
  • makes the user experience better, faster

Graph Processing System

Unrelated to @Scale, but a blog post about Graph Processing System: https://code.facebook.com/posts/319004238457019/a-comparison-of-state-of-the-art-graph-processing-systems/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment