nastra/AtScaleConf2017.md

## AtScaleConf2017.md

      
    Raw
  

              AtScaleConf2017.md
            
          
    DEV TOOLS & OPS: The journey of turning a CI system into a universal platform (Facebook)

Blogpost: https://code.facebook.com/posts/222017798302928/jupiter-a-high-performance-job-matching-service/
Video: https://www.facebook.com/atscaleevents/videos/vl.1672297376137659/1960309840908777/?type=1
Sandcastle CI platform (will be open sourced this year)

different APIS for VCS / Remote Control / ...
manages installation of needed dependencies (Android stuff, Libs, ...)
maintains machine health
all teams at FB use Sandcastle CI
Sandcastle manages fleet of machines
machines use CoW filesystem (Btrfs) for recreating Code Repo on local machines (since tests can modify or even delete the repo they work on)
Sandcastle CI uses Jupiter (high-performance job-matching service) for scalability
every job runs in an isolated environment (Worker)
written in C++ and accessible via Thrift
Workers declare what they can do or what they have (RAM, Kernel Version, Source code repositories, models of smartphones the worker is connected to)
Similarly, each Job encodes the capabilities it requires

DATA: LogDevice: A file-structured log system (Facebook)

Blogpost: https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/
Video: https://www.facebook.com/atscaleevents/videos/vl.1446382685438491/1960287690910992/?type=1
designed for a variety of logging workloads at Facebook. Will be open sourced this year. Examples are:

Scribe, which provides a fire-and-forget write API with delivery latency expectations on the order of seconds
TAO (distributed data store for the Social Graph), where FB keeps secondary indexes for their Graph. Strict record ordering per log is important, and the expected end-to-end latency is on the order of 10ms
Machine Learning Pipelines, which use LogDevice for delivering identical event streams to multiple ML model training services

designed specifically for log files, which are record-oriented, append-only, trimmable

record-oriented: data is written to log as indivisible records, rather than individual bytes
append-only: since logs are naturally append-only, there's no support for modifying existing records
trimmable: logs usually live for a long time (days/months/years) before they are deleted. Primary space reclamation mechanism is trimming (dropping oldest records based on a time/size-retention policy)

LogDevice is tunable for different performance / availability / latency requirements
uses non-deterministic placement of records to improve write availability and better tolerate temporary spikes

For each log in a LogDevice cluster there's a sequencer object whose sole job is to issue monotonically increasing sequence numbers as records are appended to that log.
sequencer may run wherever it is convenient: on a storage node, or on a node reserved for sequencing and append execution that does no storage.
once a record is stamped with a sequence number, copies (typically 3) may potentially be placed on any node in the cluster
A client that wishes to read a particular log contacts all storage nodes that are permitted to store records of that log
LogDevice client lib performs reordering and de-duplication of records
such a placement and delivery scheme is beneficial for write availability and dealing with spikes during writes, but not when performing many point reads
however, it seems that most of the time clients read records sequentially and this seems to work out
The reasoning as to why a client would contact all nodes during a read is that it seems very likely that every contacted node might have some records to deliver

LogDevice uses ZooKeeper internally as the epoch store, which is used for sequence numbers
Local log storage is based on LogsDB (which in turn is based on RocksDB)

write-optimized data store designed to keep the number of disk seeks small and controlled
write and read IO patterns on the storage device mostly sequential

HOT TOPICS: Archer, a distributed computing platform for media processing (Netflix)

Video: https://www.facebook.com/atscaleevents/videos/1960331847573243/
no concrete notes, but will be open sourced this year
DEV TOOLS & OPS: Keeping 2 billion lines of code moving forward (Google)

Related Paper: https://research.google.com/pubs/pub45424.html
Video: https://www.facebook.com/atscaleevents/videos/1960315840908177/


uses a homegrown version-control system (called Piper) to host one large codebase


used by 95% of software developers worldwide


some numbers:
** includes one billion files
** 2 billion lines of code
** 9 million source files
** history of approximately 35 million commits
** 86TB of data
** 40000 commits per day (16K by developers / 24K by automated systems)


repository serves daily: 500K - 800K queries per second (most from distributed build-and-test systems)


background info to Piper:
** stores single large repo and is implemented on top of Spanner (originally Bigtable)
** distributed across 10 Google data centers around the world, relying on Paxos algorithm for consistency guarantees across replicas
** most devs access Piper through a system called Clients in the Cloud (CitC) (consists of a cloud-based storage backend and a Linux-only FUSE file system)
** CitC supports code browsing and normal Unix tools with no need to clone or sync state locally
** CitC workspaces typically consume only a small amount of storage while presenting a seamless view of the entire Piper codebase to the developer
** essentially, CitC stores all in-progress work in the cloud, therefore making it accessible to everyone (humans / code tools)


Trunk-based development
** Google performs trunk-based Development (development on branches is unusual and not well supported)
** branches are mostly used for releases. A release is typically a snapshot of head
** when new features are developed, both new and old code paths commonly exist simultaneously, controlled through the use of conditional flags -> no dev branches needed and features can easily be turned on/off


Engineering Productivity Research

large scale log analysis to determine what each Dev is doing all day
** e.g. Clicking in code review / Clicking in code search / back to code review / new code review / ...
** graphs show also context switching throughout the day
** main goal: identify unproductive time usage and inefficient patterns
this is only possible because of having one single source code repository

DEV TOOLS & OPS: Rapid release at massive scale (Facebook)

Blogpost: https://code.facebook.com/posts/270314900139291/rapid-release-at-massive-scale/
Video: https://www.facebook.com/atscaleevents/videos/1960318917574536/
Previous Release process (for almost 10 years):

using master/release branch strategy
pushing 3x a day to production
engineers would request cherry-picks that should go from master into release
500-700 cherry-picks per days
new release branch was cut 1x a week, picking up everything that wasn't cherry-picked yet
scaled well for many years
main problem: amount of manual effort needed to coordinate/deliver large releases every week wasn't sustainable anymore

New Release process:

quasi-continuous push from master
new process/system can push tens of hundreds of diffs every few hours
each release is rolled out to 100% in a tiered fashion (tier1: Employees / tier2: 2% production / tier3: 100% production) over a few hours
release can be stopped at any tier if issues are found
Tier1: diffs that have passed a series of automated internal tests and land in master are pushed out to Facebook employees. Push-blocking alerts are issued if regressions are being introduced
Tier2: changes are pushed to 2% of production. Signal and monitor alerts are collected to detect potential regressions
Tier3: changes are pushed to 100% of production. Flytrap tool aggregates user reports and alerts if anomalies are found
Many changes use a Gatekeeper system, which allows to roll out mobile and web code releases independently from new features. If a problem is found, gatekeeper is switched off rather than reverting back to a previous version or fix forward.

Advantages of new Release process:

eliminates the need for hotfixes
allows better support for a global engineering team
provides a forcing function to develop the next generation of tools, automation, and processes necessary to allow the company to scale
makes the user experience better, faster

Graph Processing System

Unrelated to @Scale, but a blog post about Graph Processing System: https://code.facebook.com/posts/319004238457019/a-comparison-of-state-of-the-art-graph-processing-systems/