randylayman/sanotes.md

## sanotes.md

      
    Raw
  

              sanotes.md
            
          
    #OReilly Software Architecture 2016
April 11-13, 2016
###Venue
Hilton Midtown Manhattan, NY
##Day 1 (April 11)
####Microservices Tutorial - Cassandra Shum (ThoughtWorks), Maria Gomez (ThoughtWorks)
The instructors provided a VirtualBox machine that contained scripts we ran to start and stop virtual machines.  Converting monoliths to microservices meant uncommenting one invocation and invoking another.  The session was lots of do this, do that with not a lot of discussion.
The session used the Thoughtworks Go CD tool for managing the build pipelines.  The interface seems pretty straight forwards but the speed of picking up changes from the GitHub pushes wasn't great.  Might be from running in smallish virtual machine.
Did an exercise about bounded contexts to design the model for an airway, including systems for payments and rewards programs.  The point of the exercise was to not model a user because the business didn't talk about a user/customer/account.  Instead model the rewards, the payments, and the commonality into an application (because that's what a user did to create the rewards program).
The session included a deeper dive into Consumer-Driven Contracts (CDC).  The concept is that consumers of an API provide test cases to the producer repository.  These test cases assert the behavior that particular consumer is expecting remains working. They showed an open source project Pact (https://github.com/realestate-com-au/pact) that can capture the test cases on the client side and provide an artifact to provide to the server repository and then be played again there to verify the contract is fulfilled.
###The Zen of Architecture - Juval  Löwy  (IDesign Inc.)
The speaker for this session had a very large ego and a very strong opinion of his topic.  I think for all the bluster and banter there is a kernel of truth in what he said.  When designing a system, be aware of where volatility (likelihood of change) is and make sure there is a box around that to allow things to be changed out with no impacts to the larger system.  This was a bit of a tease for selling his training classes and the last ten minutes was a sales pitch for why you should come to the session.
What follows are my notes during the session, largely copying text from slides.


Avoid functional decomposition

leads to duplicating behaviors across service boundaries
discourages reuse
couples services to order and current use cases
clients stich services and bloat the clients
simple client might invoke chain.  But each step in the chain is bigger than the child because of message passing


Domain Decomposition

functional decomposition in disguise


Functional design is not testable

Can't do regression testing because of unusual conditions (???)


Chubby is opposite of chatty (fewer large messages versus more small messages)

###The IDesign Method


Decompose based on volatility

Identify areas of potential change (can be functional but not domain functional)

Encapsulate in services
Milestones based on integration, not features


Implement behavior as interaction between services or subsystems


Universal principle of good design
-encapsulate change to insulate, do not resonate with change
+ functional decomposition maximizes impact of change


features aren't in services.  Features are the integrations/interactions between the services


volatility is often not self-evident


Dunning-Kruger effect


Axes of volatility

At the same customer over time
At the same time across different customers
axes should be independent

same axes is probably functional decomposition


Decomposition and business

avoid encapsulating changes to the nature of the business

hint - very rare indeed
hint - each done poorly
the speculating design trap


Decomposition and Longevity

Estimate when change happens with simple heuristic: ability of organization/customer/market to instigate or absorb a change is constant due to nature of the business


Layered approach

typical layers

Client (Presentation)

User or another system
advocate single point of entry to system


Business

managers encapsulate sequence in use cases and workflow
each manager is collection of related use cases
engines encapsulate business rules and activities
manage may use zero or more engines
engines may be shared between managers
engine == activity (how to do something)


resource access

encapsulate resource access
may call resource stored in other services
can be shared across engines and managers
expose lowest possible business context around resources
-business verbs vs CRUD/IO


resources

physical resources


utilities

common infrastructure to all services


A cohesive usage of manager, engine, and resources is a service to the world


Strive the minimal set of interacting services that satisfy the use case


Features are always and everywhere aspects of integration not implementation


Observation - volatility decreases top down, reusability increases top-down

Managers are almost expendable
Deviation may indicate functional decomposition (or unripe decomposition)


Observation (about interaction diagrams)

functional diagrams yields forks or stair cases
volatility looks like a stick figure (maybe a bit like a tree)


http://www.idesign.net/Download/OSA


##Day 2 (April 12)
###Keynote #1 blah blah Microservices blah blah - Jonas Bonér (Lightbend)

synchronous is strong coupling (in time)
Attributes of microservices
isolation

most important trait
bulkheads failures (prevents cascading failures)
provide ability to have resilience


autonomous.  Work on their own
single responsibility pattern.  Do one thing and do it well
exclusive state, including persistence
ability to move services

addressable via stable and static address
provides location transparency


Microservices are building blocks of systems
Systems need to exploit reality

reality is not consistent
information has latency

Minimize coupling and communication
What about transactions?

within services are fine with strong consistency
Between services implement Saga Pattern

For each transaction also create reversing transaction within microservice.  Unspecified how to deal with coordination


###Keynote #2 Evolution of Evolutionary Architecture - Rebecca Parsons (ThoughtWorks)
Evolutionary architecture is ability to evolve over time
Emergent design, design patterns, and CI are the safety net that makes it possible
Continuous delivery enabled creative thinking

lower risks for production deployments
deploy when functionality dictates, removes risk from the consideration

Principles of Evolutionary Architecture

Evolvablity same level as reliability, scalability, etc
Evolve in technical, data, security, and architecture

###Keynote #3 Conversational Commerce - Stewart Nickolas (IBM)
Demo of system from IBM using voice on Apple TV, Amazon Echo, Siri to work with Watson to show how it could help a large consumer purchase, such as remodeling a kitchen.
###Keynote #4 What I learned about architecture while running marathons -Ted Malaska (Cloudera)
Sates of mediation allows problems to be solved (TV, running, etc)

if stuck > 30 minutes, go for a run
most challenges solved while running

Efficiency of motion

SQL isn't necessarily right
horsepower isn't best answer, efficiency
less with more

###Session #1 Don't build Death Star of Security - David Strauss (Pantheon)


Defense in depth


Don't overly obsess about the edge


For this session assuming 1st level is bypassed


Authentication Security

Common 2nd level attack
Capture of username/password in flight


Password hashing

Best:  Password + Salt (Random, stored with database) + Pepper (maybe static, separate from storage, possibly in source)
Use HMAC SHA512 if speed is an issue
Use Password stretching like PBKDF2 (100 rounds) to make it take more time.  Also scrypt/bcrypt


Use different complexity based on length

Based on recommendation from Stanford(?)
8-11 characters, require Aa1$
12-15 characters, require Aa1
16-19 characters, require Aa
20+ characters, require a


Recommend multifactor authentication.  Almost table stakes


Federated Authentication, such as OpenID Connect


Authentication before application

Makes it impossible to exploit application vulnerabilities without credentials
Wrap app in something like Apache with SAML


Confused Deputy Problem

Internal system too trusting


Challenge: Ambient Authority and Confused Deputies

Virtue of who you are gets you authority
Pattern: Capability-Based Security
Pattern: Mandatory Access Control, like selinux
Anti-pattern: Mandatory Access Control as an afterthought.  Complex rules makes it difficult to debug
Better: Boundaries first, container style.  Docker on CentOS works well with selinux


Staying Hands Off Sensitive Data

Pattern: Delegated Handling of Sensitive Data (think payment gateways)
Pattern: Black Hole API (write only API to receive sensitive data so receiver doesn't persist it)


Managing App Data (mobile, laptops, etc)

Pattern: Key management
Pattern: Anonymouize Data
Pattern: Physical security and device encryption
Pattern: Smart cards and hardware tokens (i.e. U2F)


Preventing a breach from spreading

Pattern: System isolation (firewall or more)
Challenge: Shared secrets (i.e. passwords)
Anti-pattern: Security through obscurity
Pattern: Public Key Infrastructure.

Encrypt all connections, no trust
Cloud flare has OSS CA written in Go for issuing certs for new hosts
MySQL can support PKI/certs for logins. Can we use this for QA access to staging database?
JWT with asymmetric mode.  Doesn't require private secret on verifier


Mitigate TLS overhead with persistent connections and session caching


###Let's not rewrite it all - Michelle Brush (Cerner Corporation)
Purpose, not modernization, drives change
Types of Rewrites:

Code
Design
API/Frameworks
Architecture
User Interface
Language
Change gets more difficult as you go down the list.  Often times change bleeds through the layers.

Rule 1: Minimize the scope.  Define and stick to a small set of objectives.
Rule 2: Create a technical vision.  Make everyone understand it.  Enforce it (i.e. build checks for disallowed components/libraries)
Rule 3: Reduce Complexity.  Look for ways to reduce scope/effort
Rule 4: Work in thin slices.

Driver for which slice is what is most likely to change.
Plan to build adapters and abstractions
Accept you will have throwaway work
Risk -- won't finish migration.

Reason - people don't understand/not bought into reason for change


Rule 5: Build your tests first

High level integration/system tests
Try to not mess with interfaces
Don't fix every bug -> changes interface.  Document and fix afterwards (or fix in base and rebuild interface assumptions/tests)

Rule 6: Embrace Operations Early.  Document operational risks as you go.
Rule 7: Invest in learning.

Crate materials as you go
Document guiding principles
Create learning materials regularly

Rule 8: Make doing the right thing easy

Make doing th wrong thing hard

Consistent hashing, shuffle sharding, and copysets: Practical tools for controlling failure - Wes Chow (Chartbeat)

Controlling (limiting) failure through sharding (applies to load balancing)
Pattern: Hashing mod
Pattern: Hashing nodes

Hash node name, plot to line
Hash keys, key stored in next highest nod

Pattern: Consistent Hashing

For every noe, generate M virtual nodes.  Repeat Hashing Nodes approach
Adding nodes results in about 1/N data getting shifted to new node

Pattern: Node Sets/Rendezvous Hashing

For all servers, hash server+key.  Server that produced the lowest hash stores the key

Consistent Hash versus Rendezvous Hash
Consistent has lots of work at startup to generate the virtual nodes (speaker uses 160, couldn't explain why when asked).  Lookup is O(ln(n))
Rendezvous has no work at startup.  Lookup is O(n)
Poison Pills

One request that does bad things (expensive query, buggy handler, unhandled input)
Results in crash.  If retries, can take out pods
Hash requests into sets of nodes.  The more sets = smaller blast radius

Shuffle Sharding

For each customer, assign to a set of servers.  Change servers for each set.  Can get pretty low blast radius with medium numbers

###It Probably Works - Tyler McMullen (Fastly)
Probabilistic Algorithms

Element of random
Used to do things ore efficiently than otherwise possible
Not guessing
Provably bound error rates

Join-Shortest-Queue

Request latency is log-normal distribution for almost all systems
Balls and Bins Problem
As load goes up, random load balancing gets worse
Join-Shortest-Queue is Better
Distributed Random is the same as single-server Random
Distributed Shorted Queue is difficult

Naive is almost same as Random
Oscillations from nodes making same decision


Better choice is "Power of 2"

Pick 2 random servers, use lowest of 2
Gives exponential improvement in variance


Count Distinct

ex. count unique # of IPs across fleet
naive solution doesn't scale well
distributed is hard for data transmission
HyperLogLog generates estimate

Extension of LogLog
Hash input to bit string.  We expect the max # of leading bits set to be approximately log2(unique items)
Improve accuracy by using more buckets and taking mean
Union of HyperLogLog is max of every bucket for distributed systems
Error rate is ~ 2%


Reliable Broadcast

Every message gets to all clients
ex. CDN purge across network
Not the same as Atomic Broadcast
Gossip Protocols

Good Systems are Boring
###App Security and Microservices - Sam Newman (ThoughtWorks)
App Security is like hand washing.  It doesn't require a specialist for most parts.
Microservices increases attach surface but reduces scope of failure.
Prevention    ==>    Detection
  |                       /\
  \/                      |
Recovery      <==     Response

Try to examine realistic and holistic thread models.  Good example is Attach Trees by Bruce Schneier.

Use HTTPS within the network

LetsEncrypt makes this easy with automation
Lemur might help with client certs for servers.  From Netflix


Confused Deputy problem

Solution: Nested SAML assertions - painful
OpenID Connect.  OpenAM supports it


Encrypt at Rest
Patch every week

Patch verification is hard
Same tools for docker
Single OS can help
Weekly probably good enough
Detection


Service like Qualys.  CVEs have fingerprints to look for in logs
Aggregate logs and store
ModSecurity
Polygot = more stuff to track.  Move things to break?  Yes, but solvable with great automation

Response

How you speak to your users
Denials can make it worse
Good example is Tylenol in physical stores in 70's.  Clear, open, quick.  See Wikipedia article
Be empathic
Game day exercises should include comms

Recovery

Side bar - use time limited API Keys with AWS
Backups

Burn it all down and rebuild it
Harder with microservices


#Day 2 (April 13)
###Keynote #1 Evolving toward microservices: How HomeDepot.com made the transition - Christopher Grant (Home Depot)

Evolutionary Architecture

Have a vision
Understand your (business) objectives
Choose what matters
Implement in stages


Decompose and Defer Boundaries

Understand products and domains
Look for the hard things
Review data models


Architect for change

Prepare for change
Be proactive, not reactive
Delay until the last responsible moment
Expect but minimize future work


Implement Safety

Automate early and often
Consider in place and greenfield
Utilize feature switches and traffic throttles
Encourage separation and independence


###Keynote #2 Going cloud native: It takes a platform - Chip Childers (Cloud Foundry Foundation)
Why does cloud native matter?

Disruption, platform economics


Simple Patterns
Highly Automated
Scaled with Ease

Industrializing the craft.  Doing artisanal at scale.
Focus on Takt Time (yes, Takt)

Desired time between units of production output
Time between two features reaching production

Emergent engineering principles

Learned the hard way
Starting to understand
12 Factors Website from Heroku provides good information

Declarative formats for automation
Clean contract with OS
Suitable for deployments on modern cloud platforms
Minimize divergence dev -> test -> prod


You're going to need a platform

Simple platform is a ticketing system (UI, more sophisticated with APIs)
Platforms make promises (software optional)
Constraints are the contracts that allow platforms keep promises
The right constraints free us ot be creative where it matters


Here is my source code
Run it on the cloud for me
I do not care how
CloudFoundry Haiku

###Keynote #3 From static to future-proof: Enterprise architectures in the age of the customer - Thomas Cozzolino (Salesforce)
To future proof, 3 demands

Metadata (really data driven)
Composibility (UI components, integration/workflow)
A bigger universe (multi-disciplinary/lense for diversity)

###Keynote #4 Let's make the pain visible - Janelle Klein (New Iron)

Stages of escalating project risk

Deferring problems
Painful releases
Thrashing
Project meltdown


What's wrong with the current strategy?

Book: Good Strategy Bad Strategy
We don't have a strategy


Obstacle #1: Managers and developers are speaking different languages
Obstacle #2: We spend tons of time working on improvements that don't actually make improvements
Root: Lack of Visibility
Need to optimize for Idea Flow, not Code
The hard part is identifying the problem
Idea Flow is a universal definition for effective practice
OpenMastery

Take Responsibility
Lean how to get there
Then teach the industry to succeed


Idea Flow is a book
###Session #1: The architect's clue bucket - Ruth Malan (Bredemeyer Consulting)
Note:  This was a really bad session.  The speaker threw up bunches of slides with quotes that were read to the attendees and didn't provide much of a narrative to connect the quotes together.

Architects have structural oversight
Makes tradeoffs
Thinks about why
Technical Debt ties systems to their past

Architect SCARS - Grady Booch

Separation of Concerns
crisp resilient Astractions
balanced distribution of Responsibilities
strive to Simplify

How to kill a system -- Strangler Application by Martin Fowler
Twitter account @papers_we_love
Rule of Three - Jeremy Weinberg
Book Empathic Technical Leadership by Alex Harms
###Session #2 Apache Kafka and the stream data platform - Jay Kreps (Confluent)
Went through the history while at LinkedIn
Philosophy - everything is an event
Tried ActiveMQ and RabbitMQ but had problems with throughput, persistence, partitioning, ordering
Kafka Connectors do the hard work of clients (config management, distributed connections, ...)
###Session #3 Microservices: What's missing. . . - Adrian Cockcroft (Battery Ventures)
Java World - Sprint Cloud.  Netflix and Pivotal are now part of that ecosystem
Hal O and SoundCloud working on tools in the Go world
Talk covers How to avoid fragile microservices (bold is each section)
Failure Injection Testing

Simian Army.  Trust with Verification

Chaos Monkey verifies stateless servers
Chaos Gorilla verifies server failover properly (run once a quarter as a Game Day exercise)
Chaos Kong verifies routing around region failures (run once a month)
Evident.io has a stronger security money if you want to pay for it


Version Aware Routing

Immediately and safely introduce a new version
Clients routed to right server
Change 1 thing at a time, client OR server
Eventually remove old versions (from code base)
Version # ==> Interface.Feature.BugFix
Deployment for types of changes:

Bug fix.  Canary test and remove old version
Feature.  Canary.

If backwards compatible, remove old versions
If no run side-by-side then upgrade clients


Interface.  Run side-by-side then upgrade clients.


Protocols

Measure serialization, transmission, and deserialization costs
Public APIs use JSON
Private (internal) APIs consider Thrift, ProtoBuf, gRPC, Avrio, SBE (SBE is the fastest by far but has limitations)

Interfaces

Build reference implementation for client
Use client in stress test harnesses
Each service should have distinct object model to reduce coupling
Version dependency interfaces with strong version pinning

Timeouts and Retries

Connection versus Request timeouts

Question:  How to handle ultimate failure?
Use persistent connections
Connection timeout should be slightly larger than network latency
Request timeout should be based on logic of request
Use cascading time budget

Edge has large timeout
Each service deeper is smaller
With small systems, set statically
With large/dynamic systems, set dynamically and pass in headers.  Reduce at each step.  Fail when can't fulfill within time


Instrument retries
Never retry on same connection


Manage Inconsistency

Manage in the app because its the only place it can be completely handled

Denormalized Data Models

Many databases => Inconsistencies exist
Antipattern: Shared schema between services
Build custom cross-data source check/repair process

Cloud Native Monitoring

High rate of change, per minute doesn't cut it
Ephemeral Configurations
Want data per second
SaaS solutions do well (DataDog on his list)

Managing Scale

Flow - AppDynamics, New Relic
Doesn't scale to 100s of services
GetGuestimate.com

Monte Carol simulator/modeler for response time


Latency doesn't follow normal deviation

Look at go-kit/kit/metrics for metric tracking in Go
###Session 4:  Designing a reactive data platform: Challenges, patterns, and antipatterns - Alex Silva (Pluralsight)
Responsive   ==>    Elastic
  /\                  ||
  ||                  \/
Message Driven =>  Resilient


Responsiveness

Goal is constant time response under varying load


Elasticity

Scalability on demand
Required for reactive
Needs:

Asynchronous
Share nothing (single responsibility pattern)
Divide and Conquer
Location Transparency


Resiliency

The ability to return to its original shape
In the face of problems, still works @ normal performance


Message Driven

Not events
Messages go to some one/thing
Events go to a bucket/topic.  Facts/past.


Their platform

Akka, Kafka, Spark


Pattern: Decentralize processing of key messages
Pattern: Design incremental communication protocol
Pattern: Hide an elastic pool of resources behind its owner (see Akka router)