Skip to content

Instantly share code, notes, and snippets.

@lost-theory
Last active January 16, 2021 17:23
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lost-theory/601b1800c3fd3c708221 to your computer and use it in GitHub Desktop.
Save lost-theory/601b1800c3fd3c708221 to your computer and use it in GitHub Desktop.
FutureStack14 notes

FutureStack 14 notes

Best talks day 1

  • Taming the Modern Datacenter, Mitchell Hashimoto, Hashicorp
    • look at the future of CM systems using Terraform, moving up a level from managing per-machine resources, enabling Infrastructure as Code for the modern distributed data center (mix of IaaS like AWS/DO, PaaS like Heroku, and SaaS for things like DNS and email); I'm a bit skeptical all the complexity of Terraform is significantly better than a minimal layer of glue code to wire up different APIs together.. still interesting though
  • Towards a Data Driven Product, Techniques and Tools at GitHub, JD Maturen, GitHub
    • covered the basics of data science & analytics, gave some insights into how GitHub does it
  • Understanding Developers in a Post Agile Environment, Ward Cunningham (inventor of wiki), New Relic
    • some interesting ideas on how to become a master engineer, "leveraged activities" vs. normal activities, why writing documentation & pairing are leveraged activities, also showed off xpdx.org, where he's documenting this stuff

Best talks day 2

  • Computer Science: America's Untapped Opportunity, Hadi Partovi, Code.org
    • talk on getting kids interested in programming and why this is important, they want to reach 100 million students with this year's Hour of Code (Dec 8-14)
  • Performance Hackathons: Trulia's Obsession With Speed and Scale, Chris Sessions & Louis Bennett, Trulia
    • interesting idea for getting Dev & Ops to work together on improving performance, not only good for improving your service, but also good for your culture; includes good practical steps on how they run their hackathons, good Q&A
  • Data Driven Monitoring, Daniel Schauenberg, Etsy
    • very similar to the Monitorama 2014 talk, solid technical info on how Etsy runs their monitoring (technologies & techniques they use), good Q&A
  • The Next Thing You Will Be Doing with Cloud, panel discussion between John Engates (CTO Rackspace) and Daniel Sturman (Google Cloud)
    • most interesting part was how the companies differentiated themselves from each other, Rackspace emphasized how much work they do with clients (working with them to scale & helping with architecture), while Google emphasized how they've built the most advanced cloud technology in-house and are now opening it up to the world
  • New Kids on The Block: Three Startups at the Forefront of Disruption, panel between BloomThat/Fastly/Preact
    • I liked this one because the three companies were all at different stages, different scales, different arch. (AWS vs. bare metal vs. hybrid), and gave a lot of good info on how & why they picked their technology stack and how it's working for them
  • Dataclysm: Who We Are (When We Think No One's Looking), Christian Rudder, OkCupid
    • no useful technical info, but very entertaining talk on what happens behind the scenes when you run a dating site

Recurring themes / big takeaways

  • New Relic - improvements: redesign launched this week, Browser got some upgrades (waterfall view like Chrome/Firebug), Insights looks pretty impressive for doing analytics, Mobile now includes crash reporting (could replace Crashlytics)
  • New Relic - new product: Synthetics, might be useful for automated end-to-end testing of critical functionality on prod
  • the words "data" and "data nerd" have lost all meaning
  • find & practice "leveraged activities" to become a better engineer, e.g. write documentation and pair with others, find what other successful engineers are doing differently vs. their peers
  • running weekly focused "hackathons" (~2 hours) on a specific area (e.g. performance) can be a good strategy for getting work done on neglected areas and increasing collaboration between teams
  • there's a lot of talk of moving to SaaS and cloud (New Relic, AWS, etc.), but there were a few people who were doing fine running things in-house (e.g. Etsy's in-house monitoring stack, Fastly running on bare metal, etc.)
  • multiple talks emphasized understanding your data and using sound math / statistics / probability / data science principles, no matter what tool or stack you use (Excel vs. SQL vs. Hive vs. Hadoop vs. Insights vs. other SaaS stuff)
  • check out / volunteer for this year's Hour of Code
  • many mentions of SoA, and two different speakers (Brockman @ Stripe, Miller @ New Relic) mentioned that SoA is currently the best tool we have for scaling (not only on the technology side, but on the people side too)

Day 1

Intro

  • Cheezy Music Videotainment Commercial for New Relic
  • Inspirational video about data nerds and how awesome they are
  • the beginning of many references to "data" & "data nerds" which will continue for the next 2 days straight

Keynote - The Impact of Software Analytics

  • Lew Cirne, Chief "Data Nerd" (CEO & Founder) @ New Relic

  • 300k people have tried New Relic

  • one year ago, at the last FutureStack, someone in the company emailed me brief message: "WH needs help"

  • this was the biggest APM problem in the history of APM

  • healthcare.gov

  • we were strapped due to the conference, but thought it was our duty to chip in

  • worked with the "war room dream team" to fix healthcare.gov, leaders of industry, many from companies in this room

  • Mikey Dickerson, lead SRE, formerly of Google

  • new relic shined

  • in 2 days we were deployed across the environment and started providing value

  • providing hard data on what was needed to fix the site

  • for people like us, NR is a no-brainer to run everywhere

  • but in DC, they don't think like we do

  • "if i hear 1 more person tell me we can't use NR, i'll punch him in the face" - mikey dickerson (wired magazine article last month)

  • dec 1 of last year the WH asked for a performance report of healthcare.gov

  • "technical monitoring instruments for real-time analysis" is govt-speak for new relic

  • what did the report show? 8 sec rsp time -> 1 sec, 6% error rate -> less than 1%

  • so our software helped them make the healthcare.gov software better

  • so that's a case where NR helped deliver data fast

  • but what about a broader sense?

  • why does what we do matter?

  • the important of time, time is the most precious asset

  • people say "time is money", but you can't create time like you can create money

  • on average people spend 6 hours per day in front of software, in front of a screen

  • half of our waking day

  • how many of those hours are mobile vs. web?

  • how many of those hours are you happy vs. meh vs. frustrated?

  • it's raining, I need to get somewhere, so I bring up uber and get a ride in a few minutes

  • it made my life better, it made me happy, it saved me time, it saved me from getting wet

  • meh = replying to emails, etc.

  • frustrated = slowness, crashes, 6 second load times

  • why are these so frustrating?

  • is this just a first world problem?

  • well.. every time this happens, you are being robbed of your time

  • life is too short for bad software

  • so our mission is to make your time in front of software as enjoyable as possible, less frustrating

  • moments matter

  • that's why we're here today, to make people happy, to build better products

  • and it's better for business too

  • I want to highlight some companies that understand this

  • "game changers"

  • first, http://www.venuenext.com/

  • I went to the 49ers game last week, new stadium

  • and it was awesome to know that if I forgot my wallet, forgot my ticket, forgot my parking pass, but I still had my phone, I would have been fine

  • there was no standing in line

  • no fumbling for tickets or money or parking passes

  • the 49ers are now in the business of data & software

  • VenueNext is an offshoot of the 49ers organization, developing this software & providing this service to other live venues

Game Changers Interview - VenueNext

  • John Paul, Founder and COO, VenueNext

How did you start?

  • when the 49ers wanted to build a new stadium, they wanted it to represent the area
  • we're in the heart of silicon valley
  • first, does this software exist? no
  • ok, let's build a team and make this
  • we launched 3 weeks ago

What's the feedback?

  • people love it

How are you measuring that?

  • oh with New Relic of course
  • it provides the base monitoring for our service
  • on our 3rd game in the new stadium, 1 in 3 people are using the app, pretty amazing
  • 61% of season ticket holders
  • in-seat delivery is probably the biggest feature, and some people said it wasn't possible
  • we measure & improve every game, iron out bugs
  • for example, 1st game: 20 minutes delivery avg, 2nd game: 10 minutes, 3rd game: 6 minutes 22 seconds

What's next?

  • expanding to other venues, concerts, etc.

Game Changers Interview - LendingClub

  • John MacIlwaine, CTO, LendingClub

  • no physical footprint like other banks

  • technology powered

  • affordable credit directly from investors

  • $5 billion in loans since 2007

  • disrupting the banking industry

You've been in the financial industry for a while, what did you see in LendingClub?

  • now is the time to disrupt the banking industry
  • who here enjoys going into the bank and talking to bankers?
  • we provide more affordable credit and better experience for everything through technology

How does it work? my savings account earns 0.25% in interest, credit cards charge 16% interest, how do you work?

  • less people, more efficient, more savings = better rates for borrowers
  • credit product vs. investor product
  • we pass the savings to both sides

Banking has 100s of years of history, how do you compete?

  • in the end it's just data
  • we look at thousands of pieces of data for each deal
  • fraud is also a big deal, and also ties into data
  • e.g. IP address & geolocation vs. information listed on loan

How are you using New Relic?

  • first started using NR 1.5 years ago, eye opener to see entire stack, all the bottlenecks
  • scalability, performance, customer experience
  • it's involved in every step of our development

What's Next for New Relic?

  • so, we have multiple products for software analytics

  • APM, Mobile, Servers, Browser, Plugins, Insights

  • i'm so excite about this

  • we are full stack software analytics

  • similar to how apple controls the hardware & software, we control & manage the full stack for you

  • we collect the data

  • securely store it in our cloud

  • 200 billion metrics per day

  • 2.5 trillion events every month

  • but this is only useful to you if it's in a thoughtful, easy-to-use, beautiful interface

  • so what are we announcing today?

Mobile:

  • mobile is the new storefront, I don't walk into stores as often anymore, I use mobile apps
  • so to be successful, you need to be mobile and monitoring what your users are doing there, what their experience is
  • mobile is the face of your business
  • for example: The Yellow Pages, transitioned from physical books to an app
  • we're monitoring 3500 apps, collectively installed a billion times
  • one complaint we heard was that users were using two tools, new relic and a crash reporting service
  • crash reporting is now part of new relic mobile as of today, with just one SDK
  • we wanted to do it right, it's completely drop-in, there's a workflow for reporting, reproduction, & resolution
  • slice and dice crashes by device, etc.
  • see impact
  • super critical stuff, if you have a mobile app, just use it, it's the only SDK you need

Browser:

  • we've had a lot of success with RUM (page load time) and AJAX (beta earlier this year)
  • 1.4 million domains are monitoring by NR Browser
  • we're 6x bigger than the #2 player in this field
  • 90% of end-to-end load time is in the browser
  • so much important stuff happens after page load
  • monitoring "customer experience" via page load time doesn't tell the whole story
  • today we're announcing: NewRelic BrowserPro
  • BTW, we redesigned the app on Monday
  • we still have the same metrics you had before
  • but now we show AJAX calls, javascript errors
  • and you don't need APM to use this, can work on 100% static sites
  • we also now capture session traces
  • taking the features of e.g. chrome inspector, firebug, etc. and making them available for any user's session, remotely
  • this is an industry first
  • once you try this out, you can't imagine life without it, this is one of those products
  • super easy to get started

Next, who else besides us is this committed to performance?

  • Cloudflare is!
  • I'd like to welcome Matthew Prince, CEO & Founder of Cloudflare
  • 5% of internet traffic passes through Cloudflare
  • today we're offering one click install of New Relic Browser for Cloudflare users, all 2 million of our customers
  • this will be so useful for our customers to help save their users even more time, on top of what cloudflare provides

Now.. a brand new product:

  • we've seen there's a need for automated in-browser testing of sites
  • e.g. register a new user, login, add an item to the cart
  • test this flow every 10 minutes
  • from a bunch of different geo locations
  • so, we're launching New Relic Synthetics
  • any selenium users in the room? this is an industry that's a little long in the tooth and due for an update
  • (live demo)
  • get alerted of regressions
  • test your site overnight when you're asleep
  • captures screenshots
  • fits hand & glove into BrowserPro and APM, every test links to session traces (BrowserPro) and request traces (APM)
  • private beta this month, generally available this quarter

Industry Disruptors Panel

  • Moderator: Don Clark of WSJ

  • Colleen Berube, VP of Business Services, Adobe

  • Sam Parnell, CTO, Bleacher Report

  • Greg Brockman, CTO, Stripe

  • Collen intro: joined in 2007, boxed software company, 18-24 month lifecycle, very traditional, over time we expanded to web applications, 2011 went all-in with SaaS, Creative Suite -> Creative Cloud, Marketing Cloud (Omniture), this totally changed the way we did business, big shift away from our 30 years of history as a boxed software company, lots of changes to architecture & mindset to provide services

  • Greg intro: joined when there were 4 people, was looking for the right people & the right problem, people are the most important thing, the constant, hire great people, ensure we work together really well, build out our tech in the right way, in CTO I've done both back end, server stuff, and engineering management, CTO is such a slippery role, not much is written about it compared to CEO; the "T" in CTO is a lie, it's mainly a people role, empowering others to get things done

  • Sam intro: Bleacher Report is a sports news site, disrupting traditional sports media, currently the 2nd largest sports site in the US, #1 sports app in the US, mobile is a huge part of what we do, we launched mobile 3 years ago and it transformed us

Colleen, the big story at Adobe was transforming to SaaS, what were the specifics of that?

  • first, moving 18-24 month lifecycles to 60-day lifecycles to quarterly releases to 220 releases per year
  • shifted from waterfall to agile/scrum
  • implemented CD in key areas
  • automation & monitoring
  • lower cycle time = increase in quality, 50% reduction in bugs

Greg, dealing with payments, very thorny business, how are you able to iterate so quickly in that environment?

  • constraints can be a good thing
  • main thing to focus on is continuous improvement
  • end-to-end ownership by engineers
  • not only building and launching code, but testing
  • as you grow, don't slow people down
  • don't be afraid to change your patterns / previous way of doing things
  • try to predict development scaling problems before they start
  • e.g. we moved to SoA from monolithic before running into problems
  • SoA is currently the best way we have for scaling systems
  • predict where things are going to break down and fix them ahead of time

Sam, how does data affect you as primarily a content site?

  • well, on mobile, load time is critical
  • we focus on delaying & preloading to load the important stuff quicker
  • advertising, how we make our living, is a tricky issue
  • we built our own ad layer to give the best experience instead of just plopping ads into content (slow, page jumps around, etc.)

Colleen, how did you get people to move at a higher speed?

  • the business side had to move more quickly too
  • we switched to SoA
  • took existing commerce engine and SoA-enabled it, exposed a common set of APIs to all teams
  • componentized things like cart, checkout, payments, etc.
  • that's just one example

Greg, in the payment space, how do you interface with other agents? (legal, banks, govt, etc.)

  • that's one of the things I like about stripe
  • a lot of people focus on what can go wrong
  • but if what we do works, it's going to be really awesome
  • e.g. instant credit card transactions
  • think big, then do the problem solving & pathfinding to get there
  • we built a credit card vault to handle storing & tokenization of card data, so the majority of our system never sees credit cards
  • we try to do this kind of thing for other areas as well

Sam, how do you use data?

  • drive editorial decisions
  • help humans make decisions
  • but don't let data trump our editorial voice, e.g. going to 100% Justin Bieber content if that's what the data shows is popular
  • one example of using data, if your team is out of the playoffs, we found that a lot of people don't care anymore, so we start running content on next year's draft

Greg, how about you?

  • we get to see the shape of commerce
  • e.g. nobody buys stuff on the weekend

Collen, how about you?

  • SaaS = better customer experience, more feedback sooner
  • other stuff

How have you been affected by breaches? How do you handle security?

  • Colleen: yes...... we're always learning and getting better
  • Greg: one key we've found is segmentation, we segment our data based on sensitivity (credit card vault vs. user data), then make sure the sensitive data stores are secure
  • Sam: only keep the info you absolutely need from users, and lock down / segment the sensitive data

The Next Step for Software Analytics (Demos!)

  • back to Lew
  • we're focused on analytics & big data, increasing the number of "green" (satisifed) users
  • we launched Insights last year, we launched a brand new query language (NRQL), thousands of people starting using it right away, we received very few support requests, so it's working for people and has incredible momentum
  • it's mindblowing what this product can do
  • 618 billion events last month, 9 ms average query response time
  • since this is in the cloud you get a lot of power, more than you would have if you built it yourself
  • here's an example query: number of unique users who have used feature X, segmented by country
  • today we're announcing some new features:
    • funnels
    • cohort analysis
    • joins against relational data
  • with these you can start solving business & marketing problems
  • (live demo)
  • funnels: specified via the query language (no clunky GUI tools), very easy to add WHERE clauses to funnels to drill down more
  • cohorts: segment users by e.g. quarter they signed up (showed new relic's user retention graph segmented by quarter)
  • "magic filters": dynamic search functionality for WHERE clauses (fuzzy search / autocomplete for columns & values)
  • super powerful for business (e.g. for VenueNext: which concession items are the most popular?)
  • demo #2: we loaded public github data into insights, and can build dashboards in seconds
  • click around, everything is interlinked and easy to query (users, repos, languages, commits, etc.)
  • ok, what's next? we have built a lot of "data apps" on top of our platform, e.g. Insights
  • but we can't build every app, everyone has different business needs, there isn't one size fits all
  • so today we're announcing the New Relic Data App Platform
  • you can build apps for anything: marketing, operations, customer support / help desk, security, finance, sales
  • we're seeing a 90% cut in development time for these apps, all hosted by us, we handled the scale, we provide a great UI, etc., and it's mobile friendly too
  • demo: who are my top customers? filter by region, drill down, pull in APM/Browser info, see user details, see how these users are using your site
  • I can build this kind of app in 30 minutes
  • demo #2: same data, but in a marketing view, funnels, conversion rates, same easy to use filtering, etc., also built in 30 minutes
  • how do you use this? just send your data to Insights, and this is all available to you
  • what's the future of Insights? more business data, more data sources, build data apps on top of that
  • we are integrating with 65 of the top cloud data stores (Salesforce, Facebook, etc.)
  • and, I'd like to announce our first acquisition as of a few days ago: Ducksboard, who specializes in this integration
  • please give a hand to the Ducksboard founders, all the way from Barcelona, WOWW
  • another demo: let's throw tweets into Insights
  • we asked twitter for access to their feed, first they offered us 10%, then they offered us just New Relic related tweets, but no, we want all the tweets
  • we only have access for a limited time
  • turns out twitter's fire hose of all tweets is only 1% of our total data volume
  • let's do some searches.. Justin Bieber gets 2x tweets as President Obama, Bieber is most popular in Brazil
  • we can segment by language, platform
  • let's try to get #FS14 up to 500 tweets/second, free GoPro to the first twitter handle I see after we reach the 500/s mark
  • you can do this too, this is going to transform your business
  • 300k people have used New Relic APM/Browser/etc.; our vision is to get millions of people and using Insights for business
  • now I'd like to introduce Chris Cook, COO & President, (recapping marketing spiel and introducing breakout sessions)

REDACTED (New Relic Synthetics)

  • Patrick Lightbody, VP Product @ New Relic

  • today we announced Synthetics

  • "Software is eating the world" - Marc Andreesen

  • my version: "Software is supporting the world"

  • we're still doing the same activities, but it's all supported and improved by software

  • and you guys are all supporting that software

  • strong ops teams monitor three things: 1. performance, 2. availability, 3. functionality

  • we were helping you with these, but we can do more

  • we started off on the backend (APM), then moved to the browser, then mobile

  • what else is there? your app depends on CDNs, 3rd party JS, the cloud, SoA, social media, payment processors, etc.

  • we can cover some of these, but the data is very noisy

  • how about clean room instrumentation of the 5 most important actions in your web app? would that cut through the noise?

  • Availability: we do ping checks, for free, it's a very popular service, but we know you want more: multiple geo-locations, pinging multiple IPs for your domain (round robin / multicast DNS)

  • there's a difference between Errors and Bugs, bugs = real issues, errors = something might be wrong

  • you need these critical functionality tests to confirm that there aren't any show-stopping bugs on your production site right now

  • our system didn't provide for that, error tracking shows you after-the-fact when something might be wrong

  • Synthetics = that safety net, to verify everything end-to-end is working all the time

Details:

  • Selenium, Node.js, in-browser code editor with good autocomplete / call hints, automated screenshots of failures, real browser engine, helpers for stuff like generating user data
  • beatufiul performance results courtesy of New Relic Browser, waterfall charts
  • built on top of Insights, so you can build custom reports
  • plugs in to APM & Browser for full traces on the FE and BE

Demo:

  • 20 geo-locations currently
  • editor does look pretty good
  • waterfall view also looks good, has a scrubber & heatmap / minimap, you can pan and zoom to quickly find errors / slow spots
  • errors are retried 3 times before alerting
  • pricing: basic Synthetics is free
  • advanced monitoring: $59/month, increases based on # of checks, has more advanced details, like headers
  • private beat for FS attendees

Taming the Modern Datacenter

  • Mitchell Hashimoto, Founder, Hashicorp
  • at Hashicorp we build tools to help manage the modern distributed data center

The history of the data center:

  • single instance of hardware

  • multiple pieces of hardware

  • virtualization -> complication & complexity, more tooling needed

  • containerization -> even more complexity, tooling is being built right now

  • a modern data center typically has all of the above, a mix of dedicated / virtual / containers

  • we are also seeing a huge move to SaaS/IaaS/PaaS for previously in-house stuff like DNS, etc., you can even put your entire database in the cloud

  • what paradigm will you choose?

  • why move to virtualization and containers? to make delivery of "apps" more efficient

  • the data center is just a means to an end

App delivery pipeline:

  • development -> deployment -> maintenance
  • in terms of Hashicorp products:
    • development: vagrant
    • deployment: terraform, packer
    • maintenance: serf, consul
  • the deploy + maintenance lifecycle:
      1. acquisition (buy servers)
      1. provision (install OS, etc.)
      1. update (app code)
      1. destroy
  • traditionally each of these steps took weeks or days to complete
  • this traditional model is changing
  • due to EC2, 5 years ago, there was a big shift in time & money required to spin up "instances"
  • computing power is now a utility, on-demand & cheap
  • also around that time: the resurgence of CM tools (chef & puppet)
  • also around that time: SaaS proliferation, outsourcing things that used to be core parts of the data center (mail, DNS, etc.)
  • so we went from weeks/days to minutes

Managing modern DCs:

  • modern DCs are a mix of all these things
  • the goal is to "move fast and don't break things"
  • devs: understand the app, care about fast deploys, don't care about underlying details
  • ops: understands the infrastructure, understands the interdepenencies of the infrastructure & the apps, also care about security, uptime, scaling, etc. on a higher level than the devs do
  • Terraform is the best way to manage the chaos of the modern DC, get everything under a unified system, including SaaS providers
  • composes all the tiers (I/P/SaaS)
  • safely modify your infrastructure
  • one workflow, technology agnostic
  • "no more dashboards", no more going into different web interfaces for different providers

Example:

  • here's an example of some Terraform code for defining a DigitalOcean droplet & associating a DNS record to it, in a human friendly, version controlled format, direct relationship between IaaS (DO) and SaaS (DNSimple)
  • Terraform can render dependency graphs, automatically visualize your entire infra and its dependences at different levels (expanding / collapsing certain resources)
  • "Providers": these are the integration points that expose the underlying resources (servers, DNS records, etc.)
  • providers follow a simple CRUD API, easy to read & write
  • in one command: order servers, provision with OpenStack, deploy hadoop, deploy a job, schedule & run the job
  • "plan" shows you what will happen based on the current state, this is a safety mechanism, you can save & replay plans so you positively know what's going to run
  • these safety features give predictable & reliable results, no more having an expert "divine" what's going to happen

Conclusion:

  • ops cares about infrastructure
  • devs care about apps
  • how can you merge these 2 concerns with terraform?
  • ops writes the shared providers & modules & resources, while devs use them as self-service blackboxes to deploy their apps
  • ops writes & maintains the "substrate"
  • devs use that to plug-and-play, this is how you move fast and not break things
  • this is Infrastructure as Code, but at a level above what most people are doing today
  • unified workflow
  • dev & ops collaborating
  • less chaos for your modern data center

Towards a Data Driven Product, Techniques and Tools at GitHub

  • JD Maturen, Analytics Lead, GitHub

  • metapoints: act on one thing at a time, buy Wizard.app (wizardmac.com), read Kahneman's "Thinking, Fast and Slow" (theory of how brains operate and make decisions)

  • background in infrastructure, social networks, enterprise SaaS

  • OODA loop (remember that act is the last step!)

  • "every deploy changes a metric"

  • I'm not a stats expert

  • tools of yesteryear: univariate timeseries data

  • no causal analysis, all summary data

  • our other tool is our intuition, which is good, but we don't want to rely on it 100%

  • a MSFT study you have a 1 in 10 to 1 in 3 chance at predicting the effects of product changes

  • we need raw data, and we need to do things probabilistically

Example #1:

  • if 3 out of 10 users sign up, what is your signup rate? 30% is one answer, but the more complete answer involves probability
  • see http://bit.ly/conj-prior for full details
  • we get a distribution centered at 30%, but it could be anywhere from 10% to 60%
  • 3/10 -> 13.5% to 56.4% (5th and 95th percentile)
  • 26/100 -> 19.6% to 33.9%
  • 316/100 -> 29.2% to 34.1%
  • what questions can we answer with this technique?
  • AB tests, etc., before/after effect of any change, experiments
  • comparing samples is also hugely useful

Example #2:

  • Month 1: 250/1000 signup retention = 25%

  • Month 2: 365/1400 signup retention = 26.07%

  • did our retention rate increase significantly? by 1%?

  • no! it did not increase significantly, we can't say for sure that the rate increased by 1%

  • http://bit.ly/bayes-bandits

  • the above two examples are binary (yes or no), there are lots of techniques for other types of data

  • another example: by putting Bitcoin on our payment page, we will increase international conversions

What github uses

  • GitHub has two sources of data: the application DB & schemaless (but structured) event logs
  • you don't need to do anything novel for data collection
  • what makes a good event? one example is recording changes to mutable data (i.e. a change log or audit trail)
  • try to pull in as much useful metadata into events as possible (but no PII / privacy sensitive data)
  • things we collect: page views, timing, user actions, any concrete steps that happen within the app
  • processing is also not novel: S3, Hive, intermediate DBs, ideally we would have one universal store & interface for that data, but this works

What do results look like?

  • we have nice designers, so they made us a nice UX for our admin interface used to visualize results
  • we show the probabilities, confidence, significance numbers, etc. for all the experiments we are running on our site

Organizational tips:

  • store & answer questions in one place

  • since we're GitHub, we use a repo for this

  • reward curiosity, incuriosity kills companies

  • have leadership set the example

  • resources: see links in this presentation, follow me on twitter @jdmaturen

Q&A:

  • Q: what about viral effects?

  • A: overlay networks

  • Q: how important is CS background to development?

  • A: I don't have a CS degree

  • Q: storing data in logs: how? why not use a document store?

  • A: it's not unstructured text like application logs, it's all in a standard format that we can pull data from, doc stores are not good at querying

  • Q: why choose Bayesian over frequentist? what evidence do you have?

  • A: it is subjective, look at decision theory, coin flipping is easy to explain in a presentation

  • Q: how do you reward curiosity?

  • A: the answers are typically reward enough, this is very interesting stuff

  • Q: have you tried New Relic Insights?

  • A: not yet, but I want to

  • Q: how do you handle multiple versions of the software running at once?

  • A: controlled experiments & careful experiment design

  • Q: what does your "repo" you described look like?

  • A: github repo, file issues when you have questions, answer with code & results, highly interactive

Understanding Developers in a Post Agile Environment

  • Ward Cunningham, Wiki Inventor & Engineer @ New Relic

  • agile programming = making decisions and living with the consequences

  • there are many ways to achieve excellence, this is my ongoing exploration and theory of learning

  • example: writing a handwritten letter: one way to achieve excellence is to start over on a blank page each time you make a mistake, after practicing this enough times, the way you think changes

  • another example: I use graphviz a lot, and I wanted to give a tutorial on graphviz at New Relic, and the simple act of writing the tutorial had a huge impact on how I use the tool!

  • tip #1: write markup (graphviz code) in a file, and write a little shell loop / script to watch the file and re-render the graph in realtime

  • tip #2: each time you make or see a graph in the real world, think about how you would make the equivalent graph in graphviz markup

  • tip #3: make & cache common commands

  • http://xpdx.org - this has been what I've been working on, federated wikis and helping programmers become better at what they do

  • "leveraged activities" vs. "normal activities"

  • pursure different directions and be surprised!

  • more surprises = more leverage!

  • for certain tasks, let's say sales for example, if your current activity is to sell $X worth of a product, there's not much you can do, you can either deliver $X or over-deliver more than $X

  • but for a lot of engineering & programming work, there's all sorts of paths you can take to reach an end-result, the more you explore on the way to the result, the better you become

  • searching logs are single-stepping through a debugger are examples of "normal activity" in programming

  • "leveraged activities": unexpected learning, writing documentation

  • StackOverflow: normal activity, you gain very little insight by reading StackOverflow answers

  • what do you do that other devs aren't doing? that's your leverage

  • I ask this question of devs

  • one developer I talked to mentioned that when he reads code, he tries to look for the "boundaries" of code & data, and this helps him understand new code, that's interesting, I think there's something there..

  • another developer told me all about dtrace

  • another developer told me about "effort under load", as load builds in a system, your system must become more efficient or else you will never catch up, when a system starts throwing errors at high load it is one way for the system to shed some of its current work, interesting idea, I hadn't thought of things this way

  • that last conversation lead me to think about "Balanced File Queues", because failing is easier than processing, failures = shedding load

  • I want all developers to do this, I want all developers to achieve excellence

  • develop a chronology, tell a story

  • but it's hard to share this information in a top-down way, e.g. asking everyone around you "tell me something amazing", you have to surface it in a different way

  • you need to wrestle the deep insight

  • example: "expedient" vs. "foundational" code, the continuum between these two, there's a lot of stuff worth thinking about here

  • create lasting artifacts, find a way to document and share

  • I've found out that "patterns" a little too dry, so I'm trying something different with xpdx.org

  • insights -> driving behavior

  • "abilities": try this code, type this!

  • "motivation": what is the purpose of this?

  • "triggers": not only how to practice, but how to remember to practice

  • pairing is a great way to share knowledge, because of those 3 things

  • pairing is one of the most efficient ways to learn

  • I'm trying to do "pairing" over the internet through the tutorials on xpdx.org

  • leading others to mastery & excellence

  • this may be as big as pairing, maybe even bigger!

The Interfaces of Our World: The Known, the Unexpected, and the Risks of Failure

  • Brent Miller, Lead Software Engineer, New Relic

  • background: ruby & frontend focus

  • this talk is about interfaces, contracts, designers, and your company

  • interface is what happens when 2 things interact

  • example: coffee cups (weird ones)

  • example: cats, @gorbypuff, the internet is made of cats, the cat "interface" is large

  • example: CA DMV office

  • example: iPad / tablet

  • an interface is designed, but is used & interpreted differently according to the user

  • shopping cart design: it was designed for holding items while you're shopping, but this person used it as a barbecue by putting a fire underneath it

  • paperclip: one designed use, but hundreds of unofficial uses if you're a divergent thinker

  • divergent thinking = creativity

  • 90% of kindergarteners are divergent thinkers, 2% of adults are

  • "totality": no filter, seeing things in their entirety

  • as we grow, we start to add filters, it's necessary to survive, it's a good thing

  • example: asking an expert an obscure problem and they can jump right to the answer, that's a good use of filters

  • bad use of filters: Ferguson shooting, quickly devolved into us vs. them, overzealous filtering, no critical thought

  • changing gears, SoA and APIs and contracts are foundations for scaling

  • shared experience & history affect the usage of interfaces

  • example: if an elevator is at floor 5 of 9, and I'm on floor 1, should I press Up or Down? we would press Up, of course, but someone who has never seen an elevator may push Down because they want to tell the elevator to come down to level 1

  • example: there are no beards in the recently released game "Destiny".. blabla.. (something something about cultural significance of beards.. not sure how this relates to interfaces or why it's worth mentioning)

  • what stack do you use? it really doesn't matter

  • it's not about the stack, it's about the people

  • if you put a Spanish chef in a Japanese sushi kitchen, he's going to figure out how to use the tools & ingredients around him to make his own style of food

  • code serves you, the machine, and whoever ends up reading it down the line (including your future self)

  • teams have interfaces too, and those interfaces can fail

  • managers are very important to a team's interface

  • leadership: McAfee, Ballmer, etc.; bosses vs. leaders, leaders are the face of your company, but they face inward too

  • leaders are the interface between the employees on the front line and the vision of the larger team & company

  • understand & honor all of the contracts you deal with

  • let others know when contracts change

  • always think of the users

Modern APM in Complex Environments

  • Greg Unrein, Principal Product Manager, New Relic

What is a complex environment?

  • deployment environments (IaaS, PaaS, bare metal)
  • types of systems (new, old, legacy, monolithic, SoA)
  • decoupled teams that need to work together
  • constant change (data, code, requirements, market, company, etc.)

What does monitoring enable?

  • work together, accountability, teamwork

  • diagnosing problems faster, faster time to resolve

  • making data understandable and actionable, presented in an effective way

  • End User Satisfaction is a key predictor of business success

  • "Apdex" is an industry standard metric for measuring satisfaction

Apdex:

  • Satisfied vs. Tolerating vs. Frustrated
  • example: Satisfied = 0-1.5 seconds, Tolerating = 1.5-6 seconds, Frustrated = 6+ seconds
  • maps the response time distribution into three buckets
  • differnet apps and even different transactions inside of an app have different thresholds for Satisfied / Tolerating / Frustrated
  • Apdex = (satisfied + tolerating/2) / total

Storytime:

  • recent feature in NR: tagging NR apps by team / environment / product / etc.
  • this allows you to filter & roll up across multiple apps when you have a large number of them
  • another nice feature: how are your different apps connected? we have different views that show you the interrelations between your different services & components
  • example: one teams' deploy broke a different team's downstream service, both teams were able to quickly figure out what went wrong, and the deploy was rolled back
  • the speed of detection & resolution was due to a bunch of NR features flowing together and working in unison

Behind the Lens: The GoPro Story

  • CJ Prober, SVP of Software & Services, GoPro
  • started off by showing a cool video
  • I help run our cloud, web, mobile, and desktop software teams, as well as CRM, data science, etc.
  • going to talk about three things:
      1. the GoPro model
      1. our User Generated Content (UGC) network
      1. the future
  • GoPro was founded in 2002 in San Mateo, we now have over 800 employees, and have the #1 selling camcorder in the U.S.
  • successful IPO last summer, but we were not an overnight success
  • ths vision: enable the expression and celebration of human passion
  • passionate users + versatile camera = the best UGC network
  • people tag or title their videos on YouTube with "GoPro", we don't prompt them to do that
  • this enables a virtuous cycle & viral growth, users advertise for us
  • if you compare our # of youtube views vs. # of sales, you see them growing at the same rate

UGC:

  • but UGC isn't completely user driven, there are different levels
  • Raw UGC -> Curated UGC -> GoPro original productions
  • example of raw UGC: we recently released our "Fetch" mount for dogs, 5 days later this video was posted on YouTube and got 14 million views: https://www.youtube.com/watch?v=UowkIRSDHfs
  • when we notice this, we reach out to the user, we offer them equipment, we offer a stipend, we provide editing, and we promote the video to boost its viral effect
  • another example: surfing pig, https://www.youtube.com/watch?v=HgQPyU3J0P0
  • this one is curated content, it's more professional looking, it's using more equipment, it has better editing
  • original productions: we believe this is the most engaging video content online
  • example #3: The Lion Whisperer, https://www.youtube.com/watch?v=MNCzSfv4hX8
  • we get 565k views on average per video
  • another aspect of UGC is athelete sponsored content, we sponsor certain athletes and people enjoy watching their content, whether it's sports-related or not (e.g. family videos, personal videos, etc.)
  • we don't promote all content though, as you can imagine people film all kinds of stuff with GoPros (vulgarity, injuries, hunting animals, etc.)

Looking ahead:

  • everyone wants to share their videos, but not everyone does it

  • editing is a problem, even getting data out of the camera can be a problem for people

  • 2 hours of video can take up 32GB

  • that's a lot of data, a lot of editing, a lot of time spent uploading

  • so we're focusing on eliminating painpoints in managing & editing content

  • software is the fastest growing part of GoPro

  • we're hiring!

  • one last video: the launch video for our latest product, Hero 4

  • https://www.youtube.com/watch?v=wTcNtgA6gHs

Day 2

Making a Difference with Software

  • Lew Cirne: welcome back everyone
  • first shout out to the NR employees who created our badges, and btw, what better way to see how the badges are working than using NR Insights? (demoed stats like # of stacks, # of activations, voltage, temperature, leaderboard of who had the most stacks, etc.)
  • how can software change the world?
  • we asked NR employees, here's a video
  • give back, contribute to open source, volunteer, using our skills as software engineers, give back to the community, go beyond corporate America, think really hard about your impact, how much power you have, build things, make the world a better place, software has the power to change the world for better
  • so, we played a part in fixing healthcare.gov, what about the rest of the world?
  • what about the developing world? technology can make a difference
  • first, I'd like to talk about Cure.org
  • they provide corrective surgeries for deformities, burns, neurological conditions, and cosmetic problems
  • welcome to the stage Joel Worrall, CTO of Cure.org

Cure.org

  • we heal kids in 30 countries around the world
  • we run clinics in these countries to help these kids
  • my job is to help build the technology to manage that
  • we can connect donors or just anyone interested to these kids, so they can donate or send messages of support, we provide updates so people can see the work we're doing, that we're actually helping people
  • story about travelling to Kenya, showing the Kenyan medical record system which is all paper based, lugging around storage containers of papers to hospitals and villages
  • we develop systems that can work offline, because you don't always have a reliable internet connection
  • we also launched an open source project called hospitalrun.io that we built, but hope anyone can use to run a hospital
  • it's cloud based, built on modern technologies: node, couch, ember, offline/online
  • also focused on usability, easy to use, intuitive
  • better software = more time & resources spent helping kids
  • Lew: how do you make sure the software is working?
  • we only have 4 developers, so it would be impossible to manage all this without new relic
  • Insights has been especially useful, we use it for things like tracking engagement
  • we not only push data into Insights and view it in NR, we pull it back out of Insights and feed it back into our site (for trending items, etc.)

Watsi

  • next, I'd like to talk about Watsi
  • so while Cure.org operates its own clinics and hospitals, Watsi approached the problem from a different angle, they promote and channel donations to different organizations which then provide the care
  • let's welcome Chase Adam, founder of Watsi
  • Chase: we got started 3 years ago, I was in the Peace Corp., and I saw a woman on a bus asking everyone for donations for a medical procedure for her son, and I noticed everyone was chipping in, so I wondered why they trusted her? it's because she had her son's medical record in a red folder and passed it around
  • I thought, why doesn't a service like this exist?
  • would you donate $1000 right now to healthcare, in general? probably no
  • would you donate $1000 right now to the person next to you if he was going to die without it? probably yes
  • we let you directly fund people who need help, tell their story through photos & updates
  • all of our data & records are publicly available
  • we use new relic to understand how our software is working
  • is our site up? is it fast? is all of our functionality working? can volunteers & doctors upload data & photos & updates into our system?
  • we collect $X in donations, and our goal is to just scale that out to 10x and 100x, and it is possible now using technology

New Relic for Non-Profits

  • we love working with organizations like Cure.org and Watsi, but we need to do more
  • it's time for us to get serious about non-profits
  • (aside: Lew just mentioned New Relic has 600 people)
  • please welcome Yvonne Wassenaar, SVP Operations @ NR, to tell us more
  • Yvonne: I was at Accenture & VMWare, and brought on to help New Relic scale
  • but also keep a focus on what's important
  • we already provide a lot of value to non-profits
  • we help them provide great software, and to provide them insights
  • when NR was small, this happened naturally, but as a company grows sometimes you lose sight of this
  • so starting in January of 2015, we are launching New Relic for Non-profits
  • we will provide 5 APM hosts for free for non-profits (is that good? I dunno, don't they have a free tier that already compare to that?)

Women & technology

  • another thing I want to change, can all the women in the room please stand up?

  • there's not enough of you

  • I have a daughter, so I'm starting to understand a bit about getting girls & women interested in technology

  • about a year ago, there was a lot of emotion about someone popular in the programming community saying something like "it's impossible to get women interested in coding", and that's probably not what he meant, but how should we respond to this?

  • I decided to teach my daughter and her three friends how to code, and they were super excited

  • I'd like to welcome Alannah Forster to the stage to tell us her story

    • I wrote a christmas card with code
    • and I was modifying that
    • my sister was interested too
    • and everyone was fighting over the code
    • Lew: you were using javascript, how did you learn?
    • I did Hour of Code as part of my English class in school
    • I learned on Khan Academy
    • (showed a live demo of the christmas card)
    • Lew: wow you have comments in here, you're better than me at this!
    • Lew: so then we started to modify this, we learned functions, loops
    • I learned about functions, so that was good
    • (showed a demo of a game she wrote, Doodle Jump clone)
    • Lew: what is @codegirlclub?
    • the Coding Clubhouse is a business I'm starting
    • Lew: let's all give Alannah a round of applause, this is the future generation of people who are going to write great software
  • so we've mentioned Code.org and Hour of Code a few times

  • we're honored to have Hadi Partovi join us to talk on the vision and where Code.org is headed

Computer Science: America's Untapped Opportunity

  • Hadi Partovi, Founder & CEO @ Code.org

  • I grew up in Iran, during the Iraq-Iran war

  • it was not a great environment for learning

  • but my dad was able to give my brother and I a Commodore 64 and a book on BASIC

  • I started there and eventually had a lot of success

  • the Job/Student Gap:

    • 2% CS students compared to 98% for other fields
    • 60% of new technical jobs are computing related vs. 40% for other math/science fields (did I get this right?)
  • CA has 78k open computing jobs, growing at 4x the national average

  • there were only 4.3k CS graduates in CA last year, the entire country was 40k students

  • exposure to CS in high school is the best way we curently have to get students into the field

  • only 300 out of 10k schools teach CS

  • there were fewer CS majors today than 10 years ago, and it's starting to catch up again, but it's not enough

  • also, women are a shrinking % of CS majors

  • AP enrollment by popularity: history, english, science, math, foreign languages, economics, art & music, then CS

  • and only a small sliver of those AP enrolled CS students are women or african american

  • CS is the best paying field, but has least number of enrolled students

  • when I started programming, it was about computers

  • now, computers are everywhere, tablets, cars, phones, etc.

  • 67% of computing jobs are outside of the tech industry (banking, medicine, etc.)

  • the Hour of Code was our effort to make an "Earth Day" for programming

  • we got over 100 partners (Google, Facebook, Microsoft, etc.)

  • just one hour can open you up to a new world, break down sterotypes

  • 44.5 million people have tried Hour of Code

  • it's also split pretty evenly between boys and girls, even a little higher for girls

  • reached 15 million students in just 5 days, the fastest service to reach 15 million users

  • where did students come from? was it the media that featured us?

  • no, it was 40,000 teachers

  • learning can feel like a game, we've seen it

  • online tutorials make teachers' lives easier

  • there is no IT hassle either, it's all over the web, nothing to install

  • students love fun & creativity, and we try to provide that

  • demo: flappy bird (code.org/flappy)

    • block programming interface, not text, drag & drop
    • there are 10 levels, progressively adding new concepts / requirements, but also letting you do some creative stuff (underwater flappy shark with different physics)
    • after I finish I can text the link to my phone and play right away
  • with our interface you learn the basics of conditions, loops, etc.

  • you learn the underlying concepts that carry over to all languages

  • don't get caught up in syntax

  • "Hour of Code" = marketing and getting people interested

  • Code.org is much more than that, we provide full courses and curricula to schools

  • we're in 30 school districts for Grade 9-12

  • 300 new classrooms, 13k students, full year courses, 34% girls, 60% black & hispanic

  • AP Comp Sci reaches 40k students, 20% girls, 18% black & hispanic

  • so we're already surpassing them in some ways, in our first year

  • for K-8 we're in 40k classrooms, 1.5 million students, 40% girls

  • we also host and support single-day workshops, free for K-5 teachers, lots of help & support, provide instructors (volunteers from the industry)

FAQs:

  • Q: if we struggle with math & reading, why spend precious time on coding?

  • A: it's just 1 hour, and kids are excited about doing this

  • Q: is this too hard for me or my students?

  • A: just try it, a 5th grader or even kindergartener can do our tutorials by themselves

  • Q: why does everyone need to learn how to "code"?

  • A: well.. we teach the concepts, we don't teach everyone to be a "coder", and this helps you in all fields, learning how to think in a structured way, and we also teach about how the internet works, how cybersecurity works, this is useful information for everyone

  • Q: isn't coding only for nerds?

  • A: we're trying to change that, cool is what we make it (celebrities, etc.)

  • Hour of Code 2014: Dec 8-14

  • we want to reach 100 million students

  • we also want to raise $5 million dollars, check our indiegogo campaign

  • we are cost effective because we're not doing all of the teaching, the teachers are doing the teaching, we just provide them really great resources

  • so we're incredibly efficient

  • visit Code.org, help us reach 100 million students!

  • to close: a video showing what we've done in the past year

  • https://www.youtube.com/watch?v=rH7AjDMz_dc

Performance on the Front Line

  • Nathan Taggart, PM @ New Relic (Browser)
  • speed is only one component of performance
  • experience is more important
  • my definition of performance: it works as expected, reliably, for everyone

The baseline:

  • no errors
  • it has to load
  • all resources load successfully

"As expected":

  • harder to measure, more subjective
  • speed
  • responseiveness
  • modern
  • using best practices
  • provides feedback to the user

"Reliably"

  • the site is up
  • all API endpoints are up
  • third party API endpoints are up (if that's not possible, time to switch)
  • users using your app don't see a difference between you and your vendors / third party dependencies

"For everyone":

  • all geographies

  • all devices

  • all browser versions

  • even for stupid users

  • your app isn't being used by people with brand new MBPs in silicon valley with high speed internet

  • it's being used by a guy with a 4 year old laptop on airplane wifi who likes clicking submit buttons 40 times

  • so build defensively

  • New Relic Browser is GA now, rolled out to all accounts as of yesterday morning

  • new deployment method available: copy and paste our snippet into any page, even single-page / static apps

Demo

  • the homepage is really good at telling you one thing: is there currently an issue?

  • AJAX: now URL centric, with smart grouping (e.g. /accounts//applications//recent_events)

  • drill into AJAX requests, specific throughput & performance breakdown

  • this includes third party endpoint requests, not just requests to your servers

  • use this to hold your third party vendors accountable to their SLA

  • our current error rate at NR: 2-3%, this is actually pretty good, we see some people turn this on and they have an error rate of 20%-40%

  • we sort script errors by "impact", how many pages & people experience this error

  • one thing you might notice: "Script error" appears multiple times, this happens because different browsers surface the errors in different ways

  • we built heuristics to group errors in a smart way, across browsers

  • we're also renaming pageviews to be URL centric, not with controller / method names

  • we found that it just fits what people expect, especially front-end developers

  • speed vs. experience: for speed you have aggregate numbers, for experience you can view individual session traces

  • "I have a problem" vs. "this is why the problem is happening": see why certain load times or events take longer than others

  • new metric: "Waiting on AJAX", this is the number you want to watch for user experience, it's when the on-load AJAX calls finish, when the current page should be fully functional

  • one interesting thing we saw in our own app: lots of chains of setInterval/setTimeout calls, busy-waiting for things to become available, this impacts the user experience, it can be inefficient

  • power features: customizable grouping / aggregation

  • standalone applications: for apps that don't use APM, just copy and paste our snippet into your template

  • other stuff: more analytics collected

  • browsers (how many IE8 users do I have? can I stop supporting IE8?)

  • georaphic areas (how is my app performing in the US vs. Asia?)

  • cloudflare integration, more integrations coming

Performance Hackathons: Trulia's Obsession With Speed and Scale

  • Chris Sessions, Director of Operations @ Trulia

  • Louis Bennett, Director of Engineering @ Trulia

  • we're going to talk about our culture, how we improve our site

  • 2007, 17 engineers, 1 building

  • 2014, 200 engineers, 5 floors, 3 buildings, 2 cities

  • how do we maintain a startup vibe as a we continue to grow?

  • anyone familiar with R. Westrum? he did a study on org. culture

  • culture = patterns for responding to problems

  • he found three categories of culture: 1. pathological, 2. bureacratic, 3. generative

  • pathological:

    • gaining and keeping power
    • low cooperation
    • messengers are shot
    • failure leads to scapegoating
  • bureaucratic:

    • rule-heavy
    • cooperation is isolated to teams
    • messengers are neglected
    • failure leads to justice (reprimanding those who don't follow rules)
  • generative:

    • performance oriented
    • high cooperation
    • messengers are trained
    • failure leads to inquiry

What's worked for us:

  • weekly release meetings, dev & ops

  • regular tech learning sessions, anyone with a good idea can present

  • lunch roulette, group outings, team building events, happy hours

  • innovation weeks (one per quarter) where you can work on anything

  • safe environment for taking risks (e.g. in post-mortem process)

  • data everywhere, no hoarding

  • regular scalability & performance hackathons

  • devops is a paradox, the best ops know dev, and the best devs know ops

  • dev = blue lens, ops = red lens, devops = seeing in 3D

  • the contribution is greatest when the two sides work together

  • new relic is a facilitator of dev & ops working together

Enter the hackathon:

  • we have performance monitoring on dashboards everywhere, but...
  • if performance is a top-line KPI (along with revenue, # of users, etc.)
  • so let's dedicate time to performance the same way we do for other things
  • spend a few hours, see what happens, and iterate from there

Take one:

  • 9 developers showed up, everyone split up into pairs
  • each pair grabbed one slow transaction and dove in
  • this was great! more eyes on the code is good, and a few pull requests were made with improvements
  • but... we need ops in here, devs can't do it alone
  • this made it even more awesome, it felt like the old days of everyone huddled around a laptop
  • faster to find problems & make fixes
  • take two: devs (FE and BE) and ops
  • two rules: attendance is optional, but if you do show up, participation is mandatory

What we found:

  • this is a little embarrassing, but we're going to show them, because you might have similar problems
  • example 1: legacy code
  • one bit of legacy code retried a flaky lucene connection, 16 second response time, that was kinda acceptable years ago, but not anymore, just throw an error
  • example 2: server oddities
  • xfs on a web server? not the filesystem.. but the X Font Server..
  • so we removed that, turn off any extraneous services

How to do this:

  • first, we are privileged, we think we're close to a generative culture, we have very high trust, so if we send out a hackathon invite, we're going to get a lot of participants, also, new relic is not free

  • step 1: you need monitoring, you need APM (e.g. new relic)

  • step 2: set aside some time, we do 2 hours per week, and it's important that it's recurring

  • step 3: find an area to work on, e.g. key transactions, and do this step ahead of time, because it's easy to get lost in the weeds, there's so much you can look at

  • step 4: make it better! (research, hack, create PRs, follow up)

  • step 5: iterate, not every change has to be a big success, keep trying

  • what worked for us may not work with you

  • but we hope this is useful for you, dev + ops working together

Q&A:

  • Q: how do you guys use Insights? do you use it?
  • A: we don't use it that much right now, but are looking at it
  • Q: how often do you hold hackathons, and how do devs juggle their normal priorities vs. performance?
  • A: tuesday afternoon from 3pm to 5pm, we don't force anyone to go, devs can prioritize by themselves
  • Q: do you prioritize perf. outside of hackathons?
  • A: yes, umm new relic allows us to see performance, other dude: deadlines sometimes force you to launch something
  • Q: what other tools do you use besides new relic?
  • A: nagios for alerting, logstash & kibana, previously ran splunk, i can go on and on, but we keep trying new things, even multiple tools that solve the same problem at once
  • Q: how does the hackathon work with product specific dev teams?
  • A: it's pretty easy actually, it's great for knowledge transfer, it's great to get new eyes on an area of the code and start asking questions about why things work that way and how they can be better
  • Q: how do you convince the business that code quality is as important as features?
  • A: we weren't founded by an engineer, so it's a little difficult for us, we're in the housing industry, which has its own ideas of "architecture" (like of a building or house) and "debt" (mortgages), so there are some interesting parallels there
  • Q: how do you load test?
  • A: we do have some synthetic tools for load tests, but our service is crawled constantly, and we silo our applications and services, so we can serve most bot / scraper traffic
  • Q: how do you turn hackathon projects into live production code?
  • A: solve the smallest problem you can, solving small problems = low risk, make a proof of concept, this allows you to see the risk & complexity of changes
  • Q: what is your release cycle time, how do you integrate your performance changes?
  • A: different products have different cycles, we're moving towards SoA so each team has autonomy to deploy & rollback whenever, our flagship product is a weekly release, while mobile API has multiple releases per day
  • Q: do you have any non-devs in the hackathon?
  • A: oh for ops? well.. any ops people in the room? it can be a challenge sometimes.. there's always uptime & application support & stability, but we're going to get better over time, i hope people will see that this is actually pretty fun and a great way to make improvements and to collaborate with devs, maybe our perf hackathons will be expanded to things like stability hackathons

Data Driven Monitoring

  • Daniel Schauenberg, Infrastructure Toolsmith @ Etsy

  • over 30 million members

  • over 18 million items listed

  • LAMP stack, linux, mysql memcached, apache, PHP

  • some postgres, java, ruby, go; but etsy.com is mostly one big PHP app

  • 120 web & api nodes

  • we deploy a lot, from 10 times per day in 2010 to 30-40 times per day in 2014

  • we split config deploys from code deploys to make deploys faster

  • how comfortable are you deploying right now?

  • on your first day you deploy the site, boot up a dev VM and deploy as soon as your laptop is set up

  • deployinator web app, one click to deploy to staging, then one click to deploy to prod

    • ruby app, runs rsync, has a log streamer, deploy log, etc.
  • dashboards: the most important actions and metrics for users on the site

  • we use deploy markers

Ganglia:

  • system level metrics
  • one instance per DC/environment
  • 220k RRD files
  • fully configured through chef role attributes

Statsd:

  • used for application metrics
  • one instance

Graphite:

  • 96GB RAM, 20 cores, 7.3TB SSD RAID 10

  • 525k metrics/minute

  • mirrored setup using carbon-relay

  • 7 relays for sharding

  • 1-2 minutes for failover via DNS, that's good enough for us

  • if you graph your graphite stats in graphite, then if there's a problem you can't debug

  • so we send graphite stats to ganglia

Nagios:

  • for alerting
  • we love nagios, it works really well for us
  • 2 instances per data center, fully configured by chef
  • service checks and contacts in git
  • notifications via email -> SMS gateway
  • ~75% of checks go to ops on-call

Nagdash:

  • aggregates results across multiple instances
  • 2000 nodes, 30k service checks

IRC integration:

  • everyone is on IRC
  • "?nag" commands in IRC can query check/host status, set downtime, etc.
  • everyone complains about the nagios UI, but if you use the API you can write whatever interface works best for you
  • using chat helps communicate everything to everyone

More:

  • syslog-ng
  • logstash
  • logster
  • supergrep (tail -f ... | grep ... across all servers)
  • eventinator (records events across the entire infrastructure, chef runs, config changes, etc.)

Information overload:

  • leads to alert fatigue
  • "nagios spring cleaning" - going through your checks and figuring out what's not useful anymore
  • but.. we have data, we can make it better
  • so we wrote nagios-herald
  • it injects context into nagios's email alerts
    • colors
    • graphs
    • links to more information

Ops weekly:

  • we want more visibility into alerts
  • opsweekly will keep track of all alerts / pages during each on-call period
  • whoever was on-call can categorize the alerts (actionable, non-actionable, etc.)
  • integrates with fitbit for sleep tracking
  • use this to turn non-critical paging alerts to email alerts (i.e. it can wait until the next day)
  • open source: github.com/etsy/opsweekly

Summary:

  • set of trusted tools for monitoring
  • always experiment
  • always learn
  • always improve
  • use data for all the decisions you make

Q&A:

  • Q: what's more important, code quality or new features?

  • A: there's no absolute decision to make, new features = make money, badly written code = costs money, there's a spectrum and you have to find the spot that makes the most sense for your company's current state

  • Q: has anyone broken the site on their first day?

  • A: yes

  • Q: do you have dev vs. production parity problems?

  • A: since it's a monolithic repo and we merge into master, not really, also, since we use VMs for development, you're always running on a VM that mirrors the production servers

  • Q: how do you know what went wrong? how do you diagnose?

  • A: look at graphs, usually there's a group of people looking at the graphs, including the person who is deploying, and it's pretty easy to spot weird patterns, if you see something weird start asking around, or if nobody knows what's happening, page the on-call person (and there are different on-call people for payments vs. something vs. something else)

  • Q: how often do you rollback?

  • A: we don't rollback, but we do deploy revert commits; we do schema changes once per week and all migrations are required to be backwards compatible

  • Q: migrations / schema changes?

  • A: we use primary-primary & replicated shards, every thursday the DBAs will look at the migration tickets and roll out the changes

  • Q: why do you open source vs. keeping it in-house?

  • A: we are built on open source tools, we benefit from open source a lot, and we want to give back, so we open source as much as we can, A/B testing framework, deployment stuff, monitoring stuff, all developers think "can i open source this? what changes do i need to make?"

  • Q: what events go into eventinator?

  • A: DNS changes, cookbook changes, nagios changes, hadoop, network, firewall; all of these are stored in elasticsearch

  • Q: what do you use for dashboards?

  • A: it's something we wrote, PHP, open source

The Next Thing You Will Be Doing with Cloud

  • moderator: Arik Hesseldahl, Senior Editor, Re/Code
  • John Engates, CTO, Rackspace
  • Daniel Sturman, Engineering Director, Google Cloud Platform

Q: what are you roles?

  • A: Engates: get in front of customers, help customers get to the cloud, figure out our roadmap
  • A: Sturman: haeds the team that manages all our computation, GCE, App Engine, and internal platforms for search, gmail, etc.; if someone wants to kick off 5000 containers, internal or external, our team handles that

Q: the problems you both deal with are at large scale, what are some of the problems you deal with?

  • A: Sturman: a lot of servers, we instrument everything, machine/kernel/container level, how do you turn all that noise into value? look for more effective ways do we use this data? in our SRE team, we have very strict principles on what not to use data for, how to do incident response, both broad symptom detection and deep root cause analysis

Q: how do you do that? internal tools?

  • A: Sturman: it's all our own stuff, there was nothing good at our scale, but one of our goals is to open up our tools to others, share this technology with others

Q: how about you Engates?

  • A: we collect that smae kind of data, but more interesting is the data our customers collect and share with us, we help you build strategies for effective use of data, we support customers all the way up and down the stack, we don't write code for them, but we can help them architect; for example our mongoDB cluster, lots of companies are using our mongo DB cluster, I think it's the largest mongo cluster in the world; not everyone can be google, it's hard to do that in-house

Q: how do people build stuff on top of things that are in your hands?

  • A: Sturman: well we offer PaaS (GAE) and IaaS (GCE), so you can focus where to put your effort, e.g. for GAE, people don't want to handle scale or worry about ops things, customers want to dial in on what level they focus on, the savings from cloud are different for each company
  • A: Engates: the cloud has always been a cost savings thing, but it's really a time savings thing, your developers can focus on the business and innovating, not on infrastructure, we help you scale your business, we have a DevOps Automation Service that lets you treat infrastructure as code, so companies can focus on being more competitive

Q: let's get to the actual question of this session, what is the next thing in cloud computing? 1 year, 2 years, 3 years out?

  • A: Engates: containers are a big focus right now, our developers are working on docker, deploying OpenStack via docker, making docker work with OpenStack, working with the docker maintainers
  • A: Sturman: internally we no longer user VMs, externally we do use them for security reasons, but containers are a beautiful solution to resource problem, we want to enable a 4 person startup to be able to scale up, we are going to offer Mangaed Runtimes, currently in alpha, give us an arbitrary containers and we'll run & scale it, coming in 2015

Q: what do you spent most of your operational day thinking about? internal vs. customers?

  • A: Sturman: first thing I worry about is customer experience, internally you can be rough around the edges, externally no, we need to have great SLAs & uptime, our customers can't go down, big focus on reliability, we also focus on support if necessary, that's what I worry about the most, our teams are really good, our systems are stable, Google doesn't go down very much.. being a cloud provider is new to us, but we know how to do this stuff
  • A: Engates: similar for us, we have a good track record as a hosting company, every day we're on calls with customers, customers are an extension of our organization, we make use of & become familiar with the tools & technologies that our customers are using, we're built on open source, we want a familiar experience with our customers, New Relic is a great example, we use it ourselves and recommend it to our customers
  • A: Sturman: well for us it's about sharing the tools we built with our customers, pivoting into making them externally accessible

Q: what is the big thing customers are asking for, now?

  • A: Engates: timely, but one thing we just released, that customers were asking for, is integration & support for Google Apps (email), not only providing it but offering support
  • A: Sturman: privacy & security is a big issue, it's something we spend a lot of time on, how do you manage privacy on PaaS and IaaS? how do we protect against attacks? we do a great job internally, how can we extend that to our customers?

Q: let's talk more about security, bad guys are able to spin up instances on your services, how do you protect your assets? prevent bad things from happening?

  • A: Engates: it's a shared responsibility at all layers: data center, network, code, third parties; it's a community effort in a lot of ways, so we make sure to harden all those layers, we do 3rd party security audits, we work with customers, we release products that help customers with security, we offer private clouds
  • A: Sturman: we have a very large security team, internal & external, if anyone can start using our platform, that's a concern; we focus on platform security, keeping OS up to date at all times, seamless migration, also focus on defending against threats, unauthorized access, XSS, etc., we're building tools to help customers with that, we don't want security to be a person-intensive area, we want to build tools to gain leverage

Q: priorities for the rest of 2014?

  • A: Engates: make sure our ecommerce customers have a successful Black Friday and holiday season, we're going to almost do an.. infrastructure freeze.. to focus on stability, more rigorous procedures, similar processes for security, custom support, etc.
  • A: Sturman: onboarding more customers, helping them with their scaling challenges, Cloud Platform Live event in November, look forward to that, can't announce anything yet

Crowd Q&A:

  • Q: emphasis on containers, how do you advise customers on running their containers? thin containers vs. fat containers with ssh, syslog, etc.
    • A: Engates: well I would defer that question to our team, of course we want people to build horizontally scalable components, same thing for containers, but that's what our DevOps team does every day, helping customers re-architect
    • A: Sturman: we internally built Kubernetes, a platform for managing containers at scale, and we're opening that platform to the community, there are some first-class design patterns that Kubernetes provides, i.e. microservices/SoA, those are sort of built-in, so that's where Kubernetes is going and we think that will help

What Developers Should Do With Data

  • Poornima Vijayashanker, Founder, Femgineer

  • when you don't have any data, you don't have any idea what's going on, but once you start collecting data on everything you're similarly confused because there's too much to look at

  • not enough data -> noisy data -> too much data -> secure data

  • in the future: we're gonna be successful and have Big Data

  • when you're starting: there's no data in the system yet..

  • we've got to make our application compelling so users use it, to generate that data

  • customers are going to ask you: why should I trust you with my data? (email address, financial info, personal info, etc.)

  • you answer that question by building trust with the user

  • customer testimonials, social proof, partners, good design

  • use analogies, e.g. for mint.com: "Bank-level security"

  • convey trust at every level, design matters, UX matters

  • make it frictionless once the user takes their first step

  • make the onboarding frictionless

  • first impressions matter

  • use tools like mixpanel to see where customers drop off

  • don't be afraid to redesign to make things simpler

  • once users start returning, you need to delight them

  • continue adding value

  • for mint.com: telling users when they were charged a fee, because everyone hates fees, we want to alert users when this happens, make it feel like we're on their side

  • one way to show users what value you provide is a simple "How it works" page

  • show users the flow, the process

  • airbnb.com does a good job of this

  • this helps convince people to sign up

Noisy data:

  • data streams, third-party data, user actions
  • how do we deal with the noise? manage all the data?
  • first, parse it, hopefully you do this early
  • next, aggregate it & mash it up
  • take user data, anonymize it, and present it back to users
  • e.g. for mint: spending on bars vs. restaurants, rent in SF vs. Austin
  • this is good for marketing, makes for good story-telling, infographics

Making sense of the data:

  • analytics: Google Analytics, Mixpanel, KissMetrics
  • find out what's important, what's not?
  • logs & databases
  • hosting: use data to determine when & how to scale
  • use New Relic to find bottlenecks
  • add caching, use a different DB technology that fits your data better
  • warehousing: TeraData, Hadoop, etc.

What's next?

  • so now you have a good handle on your data, what's next?

  • how do you be proactive?

  • first, privacy

  • what are your policies? how do you implement them? how do you integrate with third parties?

  • next, security

  • not all users practice good security (e.g. losing their phone, using weak passwords), so we need to protect them from themselves

  • three levels: user security, employee security, and outsider security

  • responsible disclosure: asking hackers to hack your sites and notify you if they find anything, giving you a window of time to fix it

  • one security breach can ruin a startup, so a lot of startups are adopting this practice

  • want to learn more? read my book: How to Tranform Your Ideas Into Software Products

  • http://femgineer.com/transform-ideas/

Q&A:

  • Q: what was your conversion rate?
    • A: hard to remember what our exact numbers were
  • Q: how do you handle trust after a security breach?
    • A: don't cover it up, that's the worst thing to do, because when it does leak you'll lose even more trust, give a full explanation and post-mortem, people want to know why, they want to know what exactly was breached (e.g. are your passwords stored in plaintext), so take responsibility and tell them how you're going to prevent this in the future
  • Q: what tools are you using to collect your data?
    • A: customer data: mixpanel & kissmetrics, application data: new relic
  • Q: at the early stages at mint, how did you do growth and retention?
    • A: lots of press, 600 interviews, for retention: continue to delight users, keep giving them more, continue to delight users and get them to come back, for inactive users: re-engage them and let them know what's changed, show users how you can help them

New Kids on The Block: Three Startups at the Forefront of Disruption

  • Moderator: Dan Scholnick, General Partner Trinity Ventures & Board Member of NR
  • Kevin Klein, VP of Engineering, BloomThat
  • Artur Bergman, Founder and CEO, Fastly
  • Chris Gooley, Founder, Preact

Q&A:

Q: Kevin, your twitter handle is "the cheese is", why?

  • A: because it's amazing! you can put so many words at the end of that sentence

Q: what do you guys do?

  • A: BloomThat: flower delivery in 90 minutes anywhere in the Bay Area
  • A: Fastly: next gen CDN, cache anything, not just static stuff like CSS, JS, images, but also things that you normally would think of caching, like API calls
  • A: Preact: user analytics in real-time, so you can preact before you react

Q: what stage are you at?

  • A: BloomThat: less than a year old, still very early, a lot of work done by hand
  • A: Fastly: Series C, 3 years old, 130 people, 50 engineers
  • A: Preact: 2.5 years now, Series A last year, 18 people, just moved to SF

Hmm.. small, medium, and large startups.. so, Q: you are all technical leaders in your companies, how do you decide what to use? e.g. for hosting?

  • A: Kevin: nothing local, all cloud, in Rackspace, tried and true LAMP, PHP, all of our , don't think about our stack too much
  • A: Preact: started off fully co-located, since I was comfortable with sysadmin, but we re-evaluated AWS as IO and RAM got cheaper, so we're in 2 DCs now, but moving to AWS for certain tasks
  • A: Fastly: we're mostly self hosted, the CDN is entirely our hardware, we do some control plane stuff in the cloud, each CDN server does 40Gbit/s in traffic, each server runs dozens of VMs, which you can't do in the cloud, so it's a lot of custom low level stuff so we can use our machines with higher efficiency

Q: but someone from AWS would argue that you can run your kind of service on AWS, do you disagree with that?

  • A: Fastly: yes, I disagree, we use our own switches for example, we tweak the linux kernel & drivers, we customize our NICs

Q: but wouldn't they say that our cost decreases faster than you can make low-level efficiency improvements?

  • A: Fastly: even if they get better, there's no way to do certain things, like sub-10ms failover between servers? you can't do it in AWS, you have to be in the data center at the switch level, but maybe they'll figure it out eventually, that'd be great

Q: Kevin, what problems have you had with the cloud?

  • A: BloomThat: random outages, but they're always getting better

Q: Gooley, what about your hybrid setup?

  • A: Preact: we have a pretty predictable workload, so we know where to use AWS, there are some more costs involved for on-premise, e.g. paying for employees full-time in the DC, but we found that cost savings for our configuration was an order of magnitude cheaper, even if we change hardware every year

Q: going up the stack, what tools do you use?

  • A: Preact: best tool for the job, and those tools change over time, e.g. we started off on Mongo and became frightened once we reached scale, rapid development is great, but we moved to Cassandra to lock down our data store, another thing: we're huge users of redis to do a lot of our analytics (counters, etc.), and the atomic data structures & operations it provides, and speed and reliability it provides, are great for us
  • A: Fastly: well for databases, we don't have any huge requirements, our databases are tiny, just configuration data, and for our frontends we use a heavily modified version of Varnish

Q: philosophically, how would you compare your approach to Preact?

  • A: Fastly: best tool for the job is great, but once you reach scale its tough, as you grow people & products you can't have everyone be an expert in so many different technologies, so we limit our technology choices, for example web API = ruby, automation = python, CDN layer = C, etc.

Q: BloomThat, what do you think? who's right?

  • A: well there is no "right", only right for your company, since we're early we try to use what works and what we're comfortable with

Q: Preact, what do you use for programming language?

  • A: web = Ruby, data = Python and more recently Scala; if your engineers are good they can learn anything, we don't add new things to the stack just because though.. we do have a lot of git repos we're trying to manage, merge projects together

Q: is Fastly different because of your type of business and type of customer?

  • A: well if we go down it is catastrophic for our customers, yes

Q: Bloomthat, how do you handle faiures?

  • A: Bloomthat: it would be a disaster if we failed on Valentine's day, so we have plans for failover to different DCs, databases, etc.

Q: how do you test failover? seems tricky..

  • A: Bloomthat: you gotta do it in production, take your failover procedure and actually test it out in production

Q: what big problems are you working on right now?

  • A: BloomThat: right now we're trying to make our admin system as efficient as possible for our employees, if we don't do that then we decrease our "flower power per hour", so we iterate fast and try to make it as quick and easy to use as possible
  • A: Fastly: we have ops problems at a scale that few companies see, not many people reach 40 Gbit/s on a single machine, we see kernel bugs, we see low level bugs, so in that way it's also a hiring problem, not many people know how to solve these
  • A: Preact: we work with a lot of data, it's hard to parallelize & process in a distributed way, we're starting to use Spark a lot

Q: any other challenges? and solutions?

  • A: BloomThat: we're not trying to do anything to outside of the box, just execute really well on our problems
  • A: Fastly: we started using Go a year ago, and it's turned out to be the best language for distributed systems, another thing is our choice of switches, we were an early adopter in software defined networking, we use Arista switches and built our own SDN on top of that, we didn't go with commercially available SDNs because they couldn't handle the traffic we have

Q: hiring is a huge problem in the Bay Area, you three are all Bay companies, how do you deal with that?

  • A: BloomThat: we look for people who not only know the technologies we use, but people who can bring something new, we're also trying to bring on more Jr people, more generalists that can grow into specialists, generalists are easier to find, training is a problem since you're strapped for time in a startup, so most of it is just live-fire training, get them in front of real problems
  • A: Fastly: we try to hire people who we know through the open source community, bringing on open source leaders makes it easier to attract good talent, it's also easier to evaluate people who have a history in open source, follow-up Q from moderator: "is funding useful for hiring?", A: after a big funding announcement it's actually bad for the company, lots more noise coming in from recruiters and the wrong type of candidates, so no, it's not good
  • A: Preact: we've doubled in size in the past year, and every hire has been a referral, so that's an alternative to hiring people in the open source community, we also hired two people that this is their first programming job, and it's been amazing watching them grow, but you need to find people who really want to grow and have that passion, follow-up Q from moderator: "what's the diff. between SoCal and the Bay?", A: harder to find people in SoCal (didn't really catch what he said)

Q: how do you deal with poaching?

  • A: Fastly: not a problem for us, we've lost extremely few engineers, we're pretty lucky in that regard, we don't worry about it, but now that you mention it... i'm probably going to start worrying
  • A: Preact: culture & novely are really important for attracting & retaining people, do you really want to spend late nights with these people? is this problem worth solving? that's what I looked for when I joined Preact

Q: from the audience, "PHP, why?!"

  • A: BloomThat: we're using modern PHP, not ancient PHP, we chose PHP because the community is just amazing, and we love CodeIgniter

Dataclysm: Who We Are (When We Think No One's Looking)

  • Christian Rudder, Co-Founder, OkCupid

  • also the writer of the OkTrends blog

  • turned the blog into a book, Dataclysm

  • now the President of OkCupid

  • OkCupid was bought by Match.com in 2011

  • I'm going to talk to you about how we turned a community dating site into data science

  • this was 2003, pre-Big Data, pre-Facebook

  • dating is a weird business

  • if your site sucks, people go away, but if your site is great, people also go away, because they've found someone and no longer need your service

  • it's also weird to try to effect things in the real world, lots of real world problems, lots of human nature problems

  • our original idea was to apply math to dating

  • but how?

  • we decided to ask personality questions

  • ask a ton of questions, some good, some bad, some important, some unimportant

  • try to ask divisive questions, "are you a murderer?" is a bad question because everyone answers no

  • apply algorithms to these answers

  • but.. no matter what data you put on the page, users always click on the most attractive photos

  • the most attractive people get too many messages, leave the site

  • the second hottest people get doubled down on, and feel creeped out

  • average users don't get any messages, and get discouraged

  • we found that we want people to connect with 4-8s, not all 10s

  • so we built our own version of HotOrNot, ripped them off, but with 5 stars instead of 10 stars

  • how guys vote on women: pretty standard looking bell curve peaking at 2.8 out of 5

  • we've heard that advertising and airbrushing and models have warped mens' perception of attractiveness of women, but our data doesn't show that

  • how women vote on guys: HUGE spike at 1-2, men are basically rated in the 1-2.5 range, chopped in half

  • are all our men ugly? no, women just rate men very harshly, their average is redefined

  • why? in straight relationships, men are the pursuers, women are the gatekeepers, so women have the ability to be more selective

  • so this threw off our initial model of matching people up, we need to percentalize mens' ratings vs. womens' ratings

  • we also look at age

  • women rate mens' age in a diagonal pattern, women think men that are the same age are the most attractive

  • men rate women very differently, they rate 20 as the most attractive age, no matter what age they themselves are

  • note that this is just QuickMatch data (attractiveness), for actual sending of messages the trend for men is more diagonal (message people that are the same age)

  • we also thought about when to show attractive people to users on the site

  • if you show all the hot people up front, the person is very engaged at first, but drops off very quickly

  • if you show not so attractive people up front, they get disinterested

  • should you show 20 year olds to 40 year olds?

  • we try not to make too many prescriptive choices like this, we wanted to be the anti-eHarmony, more open

  • we optimized on message/reply rate to start

  • but a lot of messages are "not interested" or "haha", they're not significant

  • so we start measuring length of conversations, e.g. 4-way chains of messages (you, me, you, me)

  • so we were doing this kind of thing to start, thinking the sky is the limit, we'll do this for a couple of years and figure it all out

  • the math is not hard

  • the hard part is human nature

  • the math needs to intersect human nature, you need both sides

  • a lot of people ask me how to do data science

  • should I learn X or Y, use this database, that tool

  • the most important thing is to understand what's actually happening, understand users & their motivation

  • we used SQL and excel to start, and that worked for us

  • we had a long runway because we were self-funded, and we've done a pretty good job at this point

  • so that's my lesson, get inside peoples' heads and hearts

Future of Modern Software

  • back to Lew Cirne
  • this conference is called FutureStack, but what is the "Future Stack"?
  • I knew Solomon Hykes since he was CTO of dotCloud, which was one of the first PaaS's that let you mix and match components (language, framework, database, etc.)
  • he executed one of the greatest pivots I've seen in software, moving out of the PaaS business and open sourcing the technology behind dotCloud
  • so with that, Solomon

Building the future stack: Docker

  • Solomon Hykes, Docker

  • docker is an open source project

  • it's grown faster than we ever thought possible

  • 1000 pull requests in its first year, up to 8000 now

  • Docker is not something you build your app with, its one layer below that

  • why did docker become so popular, so quickly?

  • we were in the right place at the right time, when something important was happening

Where is my app running?

  • the answer to this question was, for a long time: "that machine over there"
  • today: mobile & cloud, distributed applications
  • distributed is what people expect now, apps need to be native to the internet and to the device
  • your app needs to run on any device, it needs to be elastic, it needs to work offline & online
  • those kinds of applications are hard to build
  • there's a tooling problem
  • the "platform" is always expanding
  • we need independence between the service and the platform
  • but... for the past few years, we've made it work, somehow
  • if you are a tech company with a lot of resources, like Google, you've already solved these problems, a long time ago
  • large tech shops have already solved this problem, but everyone else hasn't
  • we're cobbling together open source tools and doing the best we can
  • we don't yet have a cohesive answer to "where is my app running?"
  • so that's what we wanted to build

Docker

  • docker is a toolkit for building distributed applications
  • it's a set of tools, but as a whole it's greater than the sum of the parts

Packaging & distribution

  • the goal of packaging & distribution is to be everywhere and nowhere
  • there's an explosion of hardware, OS's, running different versions
  • so we defined a standardized package format, based on LXC

Sandboxed runtimes

  • this goes hand-in-hand with packaging & distribution
  • we need a secure way to run applications without affecting other applictions on the system
  • think of the App Store, you can install new apps, and new apps can't break your existing apps

Networking

  • how do you get these components to talk to each other?
  • we need consistent network technology between all these components

Clustering & composition

  • related problems
  • once you have these components, how do you assemble them together into one whole? and scale out?

Identity

  • an urgent problem in distributed systems
  • related to security, reliability, control, auditing
  • what's running where, who built it, is it supported, for each component
  • cryptographic proof of identity and ownership

Authorization

  • goes with identity
  • once you know who or what something is, you can decide what level of access to give it
  • authorization is a layer that connects components together based on identity
  • how do we make this easy for developers?
  • how do we make this compatible with organizations & enterprises that already have a way of doing this stuff?

The Docker Way

    1. open always wins
    • we can't do this on our own
    • even large companies like Google can't create a perfect solution to this problem
    • this is a problem on the scale of the entire internet
    • this is a problem not suited to a single commercial entity
    • it requires a diverse set of engineers to work together and come to agreement, which is hard
    • when we built dotCloud, it was successful, but it was clear that we couldn't solve the mega-problem
    • Docker's success was a happy accident, that we got so many experts, duking it out, designing these solutions
    • it's really a snowball effect at this point, something big is happening, and we're part of it
    1. solve one problem at a time
    • there's a tendency to solve this type of problem in a monolithic way, someone writes one big ball of mud to solve the entire problem, and everyone's going to use it and it's going to be great
    • this doesn't work in the real world
    • everyone has different requirements
    • so we need loosely coupled systems
    1. scale by composition
    • this is our guiding principle on scale
    • combine existing solutions to produce novel solutions to hard problems
    1. enforce standard interfaces
    • we can't predict the future, this stuff is changing fast
    • we don't know what hardware and software will come along
    • how do you accomodate this?
    • we want to avoid having people rip apart or rewrite their entire stack just because something better came along
    • so we come up with standard interfaces that have broad agreement
  • Docker is a set of open, simple tools that each solve one problem, that you can compose together, that communicate through standard interfaces

  • we're applying Unix design principles to solve the problem of distributed applications

  • 40 years after Unix, the same techniques, on a different problem

  • please try docker, come say hi in IRC, or find me later

Q&A:

  • Q: what was it like to pivot from dotCloud to Docker? to totally change the business model? how did you present that to the board?
  • A: when we started off with dotCloud we made it clear we weren't going to go after the short-term, we asked our investors & board for patience, to trust our hunch, notice how the 4 points I mentioned earlier are all false for dotCloud, because it was monolithic and closed source, so we felt that this wasn't right, we built the first versions of Docker alongside dotCloud, started to get early feedback, and it really just took off from there, so there was no other choice but to pivot

Closing

  • grand prize winner of the badge contest is..... (trip to any conference anywhere in the world)

  • also, GoPro winners

  • also, Alannah, who can not only code, but network as well, she's getting a GoPro as well

  • this conference is about people, taking pride in your software, making the people that use your software have a joyful and productive experience

  • thank you New Relic marketing team

  • closing thoughts: moments matter, thank you for investing 2 days in coming here, I hope its been worthwhile, I can't wait until next year, we look forward to seeing you at FutureStack15, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment