- Machine Language for Beginners
- Making Games for the Atari 2600 - Steven Hugg
- The Go Programming Language
- Learn You a Haskell for Great Good!
- Building Microservices
- AWS Well-Architected Framework
- Elastic Load Balancing User Guide
- Auto Scaling User Guide
- Amazon Virtual Private Cloud User Guide
- On designing and deploying internet scale services, Hamilton ’07
- Large-scale cluster management at Google with Borg
- Hierarchical delta debugging
- Minimising faulty executions
- DBSherlock
- Runtime metric meets developer
- Slicer: Auto-sharding for datacenter applications
- The Morning Paper on Operability
- Cloud Lock-in and Change Agility
- Don't Build Private Clouds
- How do I build a global transit network on AWS?
- Introduction to AWS Security
- Amazon Web Services: Overview of Security Processes
- AWS Security Best Practices
- Introduction to AWS Security Processes
- AWS Best Practices for DDoS Resiliency
- The Methbot Operation - recommended by Ian McCunn
- Dapper
- Mystery Machine
- Canopy: an end-to-end performance tracing and analysis system
- lprof
- Pivot tracing
- Anomaly Detection Using AWS IoT and AWS Lambda
- Use the built-in Amazon SageMaker Random Cut Forest algorithm for anomaly detection
- Distributed Systems Tracing with Zipkin
- Monitoring in the time of Cloud Native
- Monitoring and Observability
- Backup, Archive, and Restore Approaches Using AWS - looks like this is superseded by the whitepaper below.
- Backup and Recovery Approaches Using AWS
- Using Amazon Web Services for Disaster Recovery
- Failure sketching
- Simplifying and isolating failure inducing input
- Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems
- In Search of an Understandable Consensus Algorithm - Original RAFT consensus algorithm paper
- Just say NO to Paxos overhead: replacing consensus with network ordering
- Automating Failure Testing Research at Internet Scale
- ORCHESTRA: A Fault Injection Environment for Distributed Systems
- FATE and DESTINI: A Framework for Cloud Recovery Testing
- How complex systems fail
- Introducing Chaos Engineering
- Principles of Chaos Engineering
- Chaos Engineering Upgraded
- Chaos Engineering by Basiri, et al. IEEE Software, May/June 2016.
- The Netflix Simian Army
- Resilience Engineering: Learning to Embrace Failure - Amazon and Google on chaos engineering
- Inside Azure Search: Chaos Engineering
- Facebook Turned Off Entire Data Center to Test Resiliency
- Building and Testing Resilient Ruby on Rails Applications
- Too big to test: Breaking a production brokerage platform without causing financial devastation
- Netflix Christmas Eve outage:
- Postmortem of database outage of January 31
- FIT : Failure Injection Testing
- Systematic resilience testing of microservices with Gremlin
- Proxies for resilience and fault tolerance in distributed SOA
- Making the Netflix API More Resilient
- Fault Tolerance in a High Volume, Distributed System
- Application Resilience in a Service-oriented Architecture
- How Netflix Leverages Multiple Regions to Increase Availability: Isthmus and Active-Active Case Study
- This Is My Architecture: Netflix Multi-Regional Resiliency and Route 53
- Performance and Fault Tolerance for the Netflix API
- Application Resilience Engineering & Operations at Netflix
- How We Use Zuul At Netflix
- Announcing Zuul: Edge Service in the Cloud
- Zuul 2 : The Netflix Journey to Asynchronous, Non-Blocking Systems
- Understanding Modern Service Discovery with Docker
- Consul Service Discovery with Docker
- Automatic Docker Service Announcement with Registrator
- The Phoenix Project - recommended by Paul Jenson, Jon Holland
- Site Reliability Engineering, by Beyer et al. (O'Reilly)
- Big Data, by Marz and Warren (Manning)
- Cassandra: The Definitive Guide
- Kafka: The Definitive Guide
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- Bigtable: A Distributed Storage System for Structured Data
- Dynamo: Amazon’s Highly Available Key-value Store
- The Chubby lock service for loosely-coupled distributed systems
- Chukwa: A large-scale monitoring system
- Cassandra – A Decentralized Structured Storage System
- HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
- S4: Distributed Stream Computing Platform
- Dremel: Interactive Analysis of Web-Scale Datasets
- Large-scale Incremental Processing Using Distributed Transactions and Notifications
- Pregel: A System for Large-Scale Graph Processing
- Spanner: Google’s Globally-Distributed Database
- Shark: Fast Data Analysis Using Coarse-grained Distributed Memory
- The PageRank Citation Ranking: Bringing Order to the Web
- A Relational Model of Data for Large Shared Data Banks
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- Finding a needle in Haystack: Facebook’s photo storage
- Spark: Cluster Computing with Working Sets
- The Unified Logging Infrastructure for Data Analytics at Twitter
- F1: A Distributed SQL Database That Scales
- Scalable Progressive Analytics on Big Data in the Cloud
- Big data: The next frontier for innovation, competition, and productivity
- The Promise and Peril of Big Data
- TDWI Checklist Report: Big Data Analytics
- Gorilla
- BTrDB: Optimizing Storage System Design for Timeseries Processing
- Chronix: Long term storage and retrieval technology for anomaly detection in operational data
- Druid: A Real-time Analytical Data Store
- Operationalizing Spark Streaming (Part 1)
- Dockerizing MySQL at Uber Engineering
- Engineering Intelligence Through Data Visualization at Uber
- Beringei: A high-performance time series storage engine - based on Gorilla
- The Elements of Statistical Learning
- Time Series Analysis and Its Applications
- Outlier Analysis
- Random Forests
- Bayesian Estimation of Causal Direction in Acyclic Structural Equation Models with Individual-specific Confounder Variables and Non-Gaussian Distributions
- Probabilistic Reasoning for Streaming Anomaly Detection
- Probabilistic Models for Anomaly Detection in Remote Sensor Data Streams
- Anomaly detection with Apache MXNet
- Time Series Anomaly Detection Algorithms
- Introducing practical and robust anomaly detection in a time series
- Forecasting Time Series Data with Multiple Seasonal Periods
- At Airbnb, Data Science Belongs Everywhere: Insights from Five Years of Hypergrowth
- Building for Trust: Insights from our efforts to distill the fuel for the sharing economy
- Anomaly Detection for Airbnb’s Payment Platform
- Unboxing the Random Forest Classifier: The Threshold Distributions
- Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers
- Overcoming Missing Values In A Random Forest Classifier
- Awesome - Most Cited Deep Learning Papers
- The amazing power of word vectors
- Machine Learning Top 10 Articles for the Past Month
- Deep Learning (Goodfellow, et al)
- Reinforcement Learning (Sutton and Barto)
- Deep Learning in Neural Networks: An Overview
- How the backpropagation algorithm works
- Life 3.0
- Superintelligence: Paths, Dangers, Strategies
- TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
- A Few Useful Things to Know about Machine Learning
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- ImageNet Classification with Deep Convolutional Neural Networks
- Going deeper with convolutions
- Long Short Term Memory Networks for Anomaly Detection in Time Series
- LSTM: A Search Space Odyssey
- Map-Reduce for Machine Learning on Multicore
- MLbase: A Distributed Machine-learning System
- TensorFlow: A system for large-scale machine learning
- Theano: A Python framework for fast computation of mathematical expressions
- Strategic attentive writer for learning macro-actions
- Evolving Neural Networks through Augmenting Topologies
- Playing Atari with Deep Reinforcement Learning
- Efficient BackProp
- Adam: A Method for Stochastic Optimization
- Generative Adversarial Nets
- Face2Face: Real-time Face Capture and Reenactment of RGB Videos
- PathNet: Evolution Channels Gradient Descent in Super Neural Networks
- On the Number of Linear Regions of Deep Neural Networks
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- The Loss Surfaces of Multilayer Networks
- Distilling the Knowledge in a Neural Network
- Mask R-CNN
- Overcoming catastrophic forgetting in neural networks
- DeepCoder: Learning to write programs
- Large-scale evolution of image classifiers
- NIPS 2016 Tutorial: Generative Adversarial Networks
- Tacotron: Towards End-to-End Speech Synthesis
- The Arcade Learning Environment: An Evaluation Platform for General Agents
- Human-level control through deep reinforcement learning
- Hidden Technical Debt in Machine Learning Systems
- A Visual Introduction to Machine Learning
- Essentials of Machine Learning Algorithms (with Python and R Codes)
- Architecting a Machine Learning System for Risk
- Designing Machine Learning Models: A Tale of Precision and Recall
- How Airbnb uses machine learning to detect host preferences
- Anomaly Detection for Time Series Data with Deep Learning
- Deep learning
- Neural Networks, Manifolds, and Topology
- Deep Learning, NLP, and Representations
- A Quick Introduction to Neural Networks
- Calculus on Computational Graphs: Backpropagation
- Neural Networks, Types, and Functional Programming
- Understanding LSTM Networks
- Attention and Augmented Recurrent Neural Networks
- An Intuitive Explanation of Convolutional Neural Networks
- Conv Nets: A Modular Perspective
- Understanding Convolutions
- Groups & Group Convolutions
- Deconvolution and Checkerboard Artifacts
- Visualizing Representations: Deep Learning and Human Beings
- Ideas on interpreting machine learning
- DeepMind: Elastic Weight Conditioning or how to fix catastrophic forgetting
- How to implement Sentiment Analysis using word embedding and Convolutional Neural Networks on Keras
- Les 12 secteurs d'activité que le machine learning va faire exploser
- The Great A.I. Awakening
- Vanilla Recurrent Neural Networks
- Caption this, with TensorFlow
- Evaluating boosted decision trees for billions of users
- Why Momentum Really Works
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning
- Back-propagation, an introduction
- Generative Adversarial Networks (GANs), Some Open Questions
- Generalization and Equilibrium in Generative Adversarial Networks (GANs)
- Recommending music on Spotify with Deep Learning
- GAN by Example using Keras on Tensorflow Backend
- Tensorflow demystified
- Understanding Hinton's Capture Networks: Part I
- Artificial Intelligence, Automation, and the Economy
- The Machine Stops - recommended by Ragz
- Learning Computer Architecture with Raspberry Pi
- Functional Design for 3D Printing
- Our Mathematical Universe
- Large-scale model of mammalian thalamocortical systems
- Reconstruction and Simulation of Neocortical Microcircuitry
- Long-Term Optical Access to an Estimated One Million Neurons in the Live Mouse Cortex
- Spike-timing dependent plasticity
- Cortical development and remapping through spike timing-dependent plasticity
- The spike timing dependence of plasticity
- Dreams, endocannabinoids and itinerant dynamics in neural networks: elaborating Crick-Mitchison's unlearning hypothesis
- Neural correlates of maintaining one’s political beliefs in the face of counterevidence
- The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects
- The Millionaire Next Door
- Adventures Among Ants: A Global Safari with a Cast of Trillions - recommended by Ragz
- Down Under: Travels in a Sunburned Country by Bill Bryson
- Memories, Guesses, and Apologies - recommended by Adrian Cockcroft