You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Taming the Modern Datacenter, Mitchell Hashimoto, Hashicorp
look at the future of CM systems using Terraform, moving up a level from managing per-machine resources, enabling Infrastructure as Code for the modern distributed data center (mix of IaaS like AWS/DO, PaaS like Heroku, and SaaS for things like DNS and email); I'm a bit skeptical all the complexity of Terraform is significantly better than a minimal layer of glue code to wire up different APIs together.. still interesting though
Towards a Data Driven Product, Techniques and Tools at GitHub, JD Maturen, GitHub
covered the basics of data science & analytics, gave some insights into how GitHub does it
Understanding Developers in a Post Agile Environment, Ward Cunningham (inventor of wiki), New Relic
some interesting ideas on how to become a master engineer, "leveraged activities" vs. normal activities, why writing documentation & pairing are leveraged activities, also showed off xpdx.org, where he's documenting this stuff
Best talks day 2
Computer Science: America's Untapped Opportunity, Hadi Partovi, Code.org
talk on getting kids interested in programming and why this is important, they want to reach 100 million students with this year's Hour of Code (Dec 8-14)
Performance Hackathons: Trulia's Obsession With Speed and Scale, Chris Sessions & Louis Bennett, Trulia
interesting idea for getting Dev & Ops to work together on improving performance, not only good for improving your service, but also good for your culture; includes good practical steps on how they run their hackathons, good Q&A
Data Driven Monitoring, Daniel Schauenberg, Etsy
very similar to the Monitorama 2014 talk, solid technical info on how Etsy runs their monitoring (technologies & techniques they use), good Q&A
The Next Thing You Will Be Doing with Cloud, panel discussion between John Engates (CTO Rackspace) and Daniel Sturman (Google Cloud)
most interesting part was how the companies differentiated themselves from each other, Rackspace emphasized how much work they do with clients (working with them to scale & helping with architecture), while Google emphasized how they've built the most advanced cloud technology in-house and are now opening it up to the world
New Kids on The Block: Three Startups at the Forefront of Disruption, panel between BloomThat/Fastly/Preact
I liked this one because the three companies were all at different stages, different scales, different arch. (AWS vs. bare metal vs. hybrid), and gave a lot of good info on how & why they picked their technology stack and how it's working for them
Dataclysm: Who We Are (When We Think No One's Looking), Christian Rudder, OkCupid
no useful technical info, but very entertaining talk on what happens behind the scenes when you run a dating site
Recurring themes / big takeaways
New Relic - improvements: redesign launched this week, Browser got some upgrades (waterfall view like Chrome/Firebug), Insights looks pretty impressive for doing analytics, Mobile now includes crash reporting (could replace Crashlytics)
New Relic - new product: Synthetics, might be useful for automated end-to-end testing of critical functionality on prod
find & practice "leveraged activities" to become a better engineer, e.g. write documentation and pair with others, find what other successful engineers are doing differently vs. their peers
running weekly focused "hackathons" (~2 hours) on a specific area (e.g. performance) can be a good strategy for getting work done on neglected areas and increasing collaboration between teams
there's a lot of talk of moving to SaaS and cloud (New Relic, AWS, etc.), but there were a few people who were doing fine running things in-house (e.g. Etsy's in-house monitoring stack, Fastly running on bare metal, etc.)
multiple talks emphasized understanding your data and using sound math / statistics / probability / data science principles, no matter what tool or stack you use (Excel vs. SQL vs. Hive vs. Hadoop vs. Insights vs. other SaaS stuff)
check out / volunteer for this year's Hour of Code
many mentions of SoA, and two different speakers (Brockman @ Stripe, Miller @ New Relic) mentioned that SoA is currently the best tool we have for scaling (not only on the technology side, but on the people side too)
on our 3rd game in the new stadium, 1 in 3 people are using the app, pretty amazing
61% of season ticket holders
in-seat delivery is probably the biggest feature, and some people said it wasn't possible
we measure & improve every game, iron out bugs
for example, 1st game: 20 minutes delivery avg, 2nd game: 10 minutes, 3rd game: 6 minutes 22 seconds
What's next?
expanding to other venues, concerts, etc.
Game Changers Interview - LendingClub
John MacIlwaine, CTO, LendingClub
no physical footprint like other banks
technology powered
affordable credit directly from investors
$5 billion in loans since 2007
disrupting the banking industry
You've been in the financial industry for a while, what did you see in LendingClub?
now is the time to disrupt the banking industry
who here enjoys going into the bank and talking to bankers?
we provide more affordable credit and better experience for everything through technology
How does it work? my savings account earns 0.25% in interest, credit cards charge 16% interest, how do you work?
less people, more efficient, more savings = better rates for borrowers
credit product vs. investor product
we pass the savings to both sides
Banking has 100s of years of history, how do you compete?
in the end it's just data
we look at thousands of pieces of data for each deal
fraud is also a big deal, and also ties into data
e.g. IP address & geolocation vs. information listed on loan
How are you using New Relic?
first started using NR 1.5 years ago, eye opener to see entire stack, all the bottlenecks
scalability, performance, customer experience
it's involved in every step of our development
What's Next for New Relic?
so, we have multiple products for software analytics
APM, Mobile, Servers, Browser, Plugins, Insights
i'm so excite about this
we are full stack software analytics
similar to how apple controls the hardware & software, we control & manage the full stack for you
we collect the data
securely store it in our cloud
200 billion metrics per day
2.5 trillion events every month
but this is only useful to you if it's in a thoughtful, easy-to-use, beautiful interface
so what are we announcing today?
Mobile:
mobile is the new storefront, I don't walk into stores as often anymore, I use mobile apps
so to be successful, you need to be mobile and monitoring what your users are doing there, what their experience is
mobile is the face of your business
for example: The Yellow Pages, transitioned from physical books to an app
we're monitoring 3500 apps, collectively installed a billion times
one complaint we heard was that users were using two tools, new relic and a crash reporting service
crash reporting is now part of new relic mobile as of today, with just one SDK
we wanted to do it right, it's completely drop-in, there's a workflow for reporting, reproduction, & resolution
slice and dice crashes by device, etc.
see impact
super critical stuff, if you have a mobile app, just use it, it's the only SDK you need
Browser:
we've had a lot of success with RUM (page load time) and AJAX (beta earlier this year)
1.4 million domains are monitoring by NR Browser
we're 6x bigger than the #2 player in this field
90% of end-to-end load time is in the browser
so much important stuff happens after page load
monitoring "customer experience" via page load time doesn't tell the whole story
today we're announcing: NewRelic BrowserPro
BTW, we redesigned the app on Monday
we still have the same metrics you had before
but now we show AJAX calls, javascript errors
and you don't need APM to use this, can work on 100% static sites
we also now capture session traces
taking the features of e.g. chrome inspector, firebug, etc. and making them available for any user's session, remotely
this is an industry first
once you try this out, you can't imagine life without it, this is one of those products
super easy to get started
Next, who else besides us is this committed to performance?
Cloudflare is!
I'd like to welcome Matthew Prince, CEO & Founder of Cloudflare
5% of internet traffic passes through Cloudflare
today we're offering one click install of New Relic Browser for Cloudflare users, all 2 million of our customers
this will be so useful for our customers to help save their users even more time, on top of what cloudflare provides
Now.. a brand new product:
we've seen there's a need for automated in-browser testing of sites
e.g. register a new user, login, add an item to the cart
test this flow every 10 minutes
from a bunch of different geo locations
so, we're launching New Relic Synthetics
any selenium users in the room? this is an industry that's a little long in the tooth and due for an update
(live demo)
get alerted of regressions
test your site overnight when you're asleep
captures screenshots
fits hand & glove into BrowserPro and APM, every test links to session traces (BrowserPro) and request traces (APM)
private beta this month, generally available this quarter
Industry Disruptors Panel
Moderator: Don Clark of WSJ
Colleen Berube, VP of Business Services, Adobe
Sam Parnell, CTO, Bleacher Report
Greg Brockman, CTO, Stripe
Collen intro: joined in 2007, boxed software company, 18-24 month lifecycle, very traditional, over time we expanded to web applications, 2011 went all-in with SaaS, Creative Suite -> Creative Cloud, Marketing Cloud (Omniture), this totally changed the way we did business, big shift away from our 30 years of history as a boxed software company, lots of changes to architecture & mindset to provide services
Greg intro: joined when there were 4 people, was looking for the right people & the right problem, people are the most important thing, the constant, hire great people, ensure we work together really well, build out our tech in the right way, in CTO I've done both back end, server stuff, and engineering management, CTO is such a slippery role, not much is written about it compared to CEO; the "T" in CTO is a lie, it's mainly a people role, empowering others to get things done
Sam intro: Bleacher Report is a sports news site, disrupting traditional sports media, currently the 2nd largest sports site in the US, #1 sports app in the US, mobile is a huge part of what we do, we launched mobile 3 years ago and it transformed us
Colleen, the big story at Adobe was transforming to SaaS, what were the specifics of that?
first, moving 18-24 month lifecycles to 60-day lifecycles to quarterly releases to 220 releases per year
shifted from waterfall to agile/scrum
implemented CD in key areas
automation & monitoring
lower cycle time = increase in quality, 50% reduction in bugs
Greg, dealing with payments, very thorny business, how are you able to iterate so quickly in that environment?
constraints can be a good thing
main thing to focus on is continuous improvement
end-to-end ownership by engineers
not only building and launching code, but testing
as you grow, don't slow people down
don't be afraid to change your patterns / previous way of doing things
try to predict development scaling problems before they start
e.g. we moved to SoA from monolithic before running into problems
SoA is currently the best way we have for scaling systems
predict where things are going to break down and fix them ahead of time
Sam, how does data affect you as primarily a content site?
well, on mobile, load time is critical
we focus on delaying & preloading to load the important stuff quicker
advertising, how we make our living, is a tricky issue
we built our own ad layer to give the best experience instead of just plopping ads into content (slow, page jumps around, etc.)
Colleen, how did you get people to move at a higher speed?
the business side had to move more quickly too
we switched to SoA
took existing commerce engine and SoA-enabled it, exposed a common set of APIs to all teams
componentized things like cart, checkout, payments, etc.
that's just one example
Greg, in the payment space, how do you interface with other agents? (legal, banks, govt, etc.)
that's one of the things I like about stripe
a lot of people focus on what can go wrong
but if what we do works, it's going to be really awesome
e.g. instant credit card transactions
think big, then do the problem solving & pathfinding to get there
we built a credit card vault to handle storing & tokenization of card data, so the majority of our system never sees credit cards
we try to do this kind of thing for other areas as well
Sam, how do you use data?
drive editorial decisions
help humans make decisions
but don't let data trump our editorial voice, e.g. going to 100% Justin Bieber content if that's what the data shows is popular
one example of using data, if your team is out of the playoffs, we found that a lot of people don't care anymore, so we start running content on next year's draft
Greg, how about you?
we get to see the shape of commerce
e.g. nobody buys stuff on the weekend
Collen, how about you?
SaaS = better customer experience, more feedback sooner
other stuff
How have you been affected by breaches? How do you handle security?
Colleen: yes...... we're always learning and getting better
Greg: one key we've found is segmentation, we segment our data based on sensitivity (credit card vault vs. user data), then make sure the sensitive data stores are secure
Sam: only keep the info you absolutely need from users, and lock down / segment the sensitive data
The Next Step for Software Analytics (Demos!)
back to Lew
we're focused on analytics & big data, increasing the number of "green" (satisifed) users
we launched Insights last year, we launched a brand new query language (NRQL), thousands of people starting using it right away, we received very few support requests, so it's working for people and has incredible momentum
it's mindblowing what this product can do
618 billion events last month, 9 ms average query response time
since this is in the cloud you get a lot of power, more than you would have if you built it yourself
here's an example query: number of unique users who have used feature X, segmented by country
today we're announcing some new features:
funnels
cohort analysis
joins against relational data
with these you can start solving business & marketing problems
(live demo)
funnels: specified via the query language (no clunky GUI tools), very easy to add WHERE clauses to funnels to drill down more
cohorts: segment users by e.g. quarter they signed up (showed new relic's user retention graph segmented by quarter)
"magic filters": dynamic search functionality for WHERE clauses (fuzzy search / autocomplete for columns & values)
super powerful for business (e.g. for VenueNext: which concession items are the most popular?)
demo #2: we loaded public github data into insights, and can build dashboards in seconds
click around, everything is interlinked and easy to query (users, repos, languages, commits, etc.)
ok, what's next? we have built a lot of "data apps" on top of our platform, e.g. Insights
but we can't build every app, everyone has different business needs, there isn't one size fits all
so today we're announcing the New Relic Data App Platform
you can build apps for anything: marketing, operations, customer support / help desk, security, finance, sales
we're seeing a 90% cut in development time for these apps, all hosted by us, we handled the scale, we provide a great UI, etc., and it's mobile friendly too
demo: who are my top customers? filter by region, drill down, pull in APM/Browser info, see user details, see how these users are using your site
I can build this kind of app in 30 minutes
demo #2: same data, but in a marketing view, funnels, conversion rates, same easy to use filtering, etc., also built in 30 minutes
how do you use this? just send your data to Insights, and this is all available to you
what's the future of Insights? more business data, more data sources, build data apps on top of that
we are integrating with 65 of the top cloud data stores (Salesforce, Facebook, etc.)
and, I'd like to announce our first acquisition as of a few days ago: Ducksboard, who specializes in this integration
please give a hand to the Ducksboard founders, all the way from Barcelona, WOWW
another demo: let's throw tweets into Insights
we asked twitter for access to their feed, first they offered us 10%, then they offered us just New Relic related tweets, but no, we want all the tweets
we only have access for a limited time
turns out twitter's fire hose of all tweets is only 1% of our total data volume
let's do some searches.. Justin Bieber gets 2x tweets as President Obama, Bieber is most popular in Brazil
we can segment by language, platform
let's try to get #FS14 up to 500 tweets/second, free GoPro to the first twitter handle I see after we reach the 500/s mark
you can do this too, this is going to transform your business
300k people have used New Relic APM/Browser/etc.; our vision is to get millions of people and using Insights for business
now I'd like to introduce Chris Cook, COO & President, (recapping marketing spiel and introducing breakout sessions)
REDACTED (New Relic Synthetics)
Patrick Lightbody, VP Product @ New Relic
today we announced Synthetics
"Software is eating the world" - Marc Andreesen
my version: "Software is supporting the world"
we're still doing the same activities, but it's all supported and improved by software
and you guys are all supporting that software
strong ops teams monitor three things: 1. performance, 2. availability, 3. functionality
we were helping you with these, but we can do more
we started off on the backend (APM), then moved to the browser, then mobile
what else is there? your app depends on CDNs, 3rd party JS, the cloud, SoA, social media, payment processors, etc.
we can cover some of these, but the data is very noisy
how about clean room instrumentation of the 5 most important actions in your web app? would that cut through the noise?
Availability: we do ping checks, for free, it's a very popular service, but we know you want more: multiple geo-locations, pinging multiple IPs for your domain (round robin / multicast DNS)
there's a difference between Errors and Bugs, bugs = real issues, errors = something might be wrong
you need these critical functionality tests to confirm that there aren't any show-stopping bugs on your production site right now
our system didn't provide for that, error tracking shows you after-the-fact when something might be wrong
Synthetics = that safety net, to verify everything end-to-end is working all the time
Details:
Selenium, Node.js, in-browser code editor with good autocomplete / call hints, automated screenshots of failures, real browser engine, helpers for stuff like generating user data
beatufiul performance results courtesy of New Relic Browser, waterfall charts
built on top of Insights, so you can build custom reports
plugs in to APM & Browser for full traces on the FE and BE
Demo:
20 geo-locations currently
editor does look pretty good
waterfall view also looks good, has a scrubber & heatmap / minimap, you can pan and zoom to quickly find errors / slow spots
errors are retried 3 times before alerting
pricing: basic Synthetics is free
advanced monitoring: $59/month, increases based on # of checks, has more advanced details, like headers
private beat for FS attendees
Taming the Modern Datacenter
Mitchell Hashimoto, Founder, Hashicorp
at Hashicorp we build tools to help manage the modern distributed data center
The history of the data center:
single instance of hardware
multiple pieces of hardware
virtualization -> complication & complexity, more tooling needed
containerization -> even more complexity, tooling is being built right now
a modern data center typically has all of the above, a mix of dedicated / virtual / containers
we are also seeing a huge move to SaaS/IaaS/PaaS for previously in-house stuff like DNS, etc., you can even put your entire database in the cloud
what paradigm will you choose?
why move to virtualization and containers? to make delivery of "apps" more efficient
the data center is just a means to an end
App delivery pipeline:
development -> deployment -> maintenance
in terms of Hashicorp products:
development: vagrant
deployment: terraform, packer
maintenance: serf, consul
the deploy + maintenance lifecycle:
acquisition (buy servers)
provision (install OS, etc.)
update (app code)
destroy
traditionally each of these steps took weeks or days to complete
this traditional model is changing
due to EC2, 5 years ago, there was a big shift in time & money required to spin up "instances"
computing power is now a utility, on-demand & cheap
also around that time: the resurgence of CM tools (chef & puppet)
also around that time: SaaS proliferation, outsourcing things that used to be core parts of the data center (mail, DNS, etc.)
so we went from weeks/days to minutes
Managing modern DCs:
modern DCs are a mix of all these things
the goal is to "move fast and don't break things"
devs: understand the app, care about fast deploys, don't care about underlying details
ops: understands the infrastructure, understands the interdepenencies of the infrastructure & the apps, also care about security, uptime, scaling, etc. on a higher level than the devs do
Terraform is the best way to manage the chaos of the modern DC, get everything under a unified system, including SaaS providers
composes all the tiers (I/P/SaaS)
safely modify your infrastructure
one workflow, technology agnostic
"no more dashboards", no more going into different web interfaces for different providers
Example:
here's an example of some Terraform code for defining a DigitalOcean droplet & associating a DNS record to it, in a human friendly, version controlled format, direct relationship between IaaS (DO) and SaaS (DNSimple)
Terraform can render dependency graphs, automatically visualize your entire infra and its dependences at different levels (expanding / collapsing certain resources)
"Providers": these are the integration points that expose the underlying resources (servers, DNS records, etc.)
providers follow a simple CRUD API, easy to read & write
in one command: order servers, provision with OpenStack, deploy hadoop, deploy a job, schedule & run the job
"plan" shows you what will happen based on the current state, this is a safety mechanism, you can save & replay plans so you positively know what's going to run
these safety features give predictable & reliable results, no more having an expert "divine" what's going to happen
Conclusion:
ops cares about infrastructure
devs care about apps
how can you merge these 2 concerns with terraform?
ops writes the shared providers & modules & resources, while devs use them as self-service blackboxes to deploy their apps
ops writes & maintains the "substrate"
devs use that to plug-and-play, this is how you move fast and not break things
this is Infrastructure as Code, but at a level above what most people are doing today
unified workflow
dev & ops collaborating
less chaos for your modern data center
Towards a Data Driven Product, Techniques and Tools at GitHub
JD Maturen, Analytics Lead, GitHub
metapoints: act on one thing at a time, buy Wizard.app (wizardmac.com), read Kahneman's "Thinking, Fast and Slow" (theory of how brains operate and make decisions)
background in infrastructure, social networks, enterprise SaaS
OODA loop (remember that act is the last step!)
"every deploy changes a metric"
I'm not a stats expert
tools of yesteryear: univariate timeseries data
no causal analysis, all summary data
our other tool is our intuition, which is good, but we don't want to rely on it 100%
a MSFT study you have a 1 in 10 to 1 in 3 chance at predicting the effects of product changes
we need raw data, and we need to do things probabilistically
Example #1:
if 3 out of 10 users sign up, what is your signup rate? 30% is one answer, but the more complete answer involves probability
A: github repo, file issues when you have questions, answer with code & results, highly interactive
Understanding Developers in a Post Agile Environment
Ward Cunningham, Wiki Inventor & Engineer @ New Relic
agile programming = making decisions and living with the consequences
there are many ways to achieve excellence, this is my ongoing exploration and theory of learning
example: writing a handwritten letter: one way to achieve excellence is to start over on a blank page each time you make a mistake, after practicing this enough times, the way you think changes
another example: I use graphviz a lot, and I wanted to give a tutorial on graphviz at New Relic, and the simple act of writing the tutorial had a huge impact on how I use the tool!
tip #1: write markup (graphviz code) in a file, and write a little shell loop / script to watch the file and re-render the graph in realtime
tip #2: each time you make or see a graph in the real world, think about how you would make the equivalent graph in graphviz markup
tip #3: make & cache common commands
http://xpdx.org - this has been what I've been working on, federated wikis and helping programmers become better at what they do
"leveraged activities" vs. "normal activities"
pursure different directions and be surprised!
more surprises = more leverage!
for certain tasks, let's say sales for example, if your current activity is to sell $X worth of a product, there's not much you can do, you can either deliver $X or over-deliver more than $X
but for a lot of engineering & programming work, there's all sorts of paths you can take to reach an end-result, the more you explore on the way to the result, the better you become
searching logs are single-stepping through a debugger are examples of "normal activity" in programming
StackOverflow: normal activity, you gain very little insight by reading StackOverflow answers
what do you do that other devs aren't doing? that's your leverage
I ask this question of devs
one developer I talked to mentioned that when he reads code, he tries to look for the "boundaries" of code & data, and this helps him understand new code, that's interesting, I think there's something there..
another developer told me all about dtrace
another developer told me about "effort under load", as load builds in a system, your system must become more efficient or else you will never catch up, when a system starts throwing errors at high load it is one way for the system to shed some of its current work, interesting idea, I hadn't thought of things this way
that last conversation lead me to think about "Balanced File Queues", because failing is easier than processing, failures = shedding load
I want all developers to do this, I want all developers to achieve excellence
develop a chronology, tell a story
but it's hard to share this information in a top-down way, e.g. asking everyone around you "tell me something amazing", you have to surface it in a different way
you need to wrestle the deep insight
example: "expedient" vs. "foundational" code, the continuum between these two, there's a lot of stuff worth thinking about here
create lasting artifacts, find a way to document and share
I've found out that "patterns" a little too dry, so I'm trying something different with xpdx.org
insights -> driving behavior
"abilities": try this code, type this!
"motivation": what is the purpose of this?
"triggers": not only how to practice, but how to remember to practice
pairing is a great way to share knowledge, because of those 3 things
pairing is one of the most efficient ways to learn
I'm trying to do "pairing" over the internet through the tutorials on xpdx.org
leading others to mastery & excellence
this may be as big as pairing, maybe even bigger!
The Interfaces of Our World: The Known, the Unexpected, and the Risks of Failure
Brent Miller, Lead Software Engineer, New Relic
background: ruby & frontend focus
this talk is about interfaces, contracts, designers, and your company
interface is what happens when 2 things interact
example: coffee cups (weird ones)
example: cats, @gorbypuff, the internet is made of cats, the cat "interface" is large
example: CA DMV office
example: iPad / tablet
an interface is designed, but is used & interpreted differently according to the user
shopping cart design: it was designed for holding items while you're shopping, but this person used it as a barbecue by putting a fire underneath it
paperclip: one designed use, but hundreds of unofficial uses if you're a divergent thinker
divergent thinking = creativity
90% of kindergarteners are divergent thinkers, 2% of adults are
"totality": no filter, seeing things in their entirety
as we grow, we start to add filters, it's necessary to survive, it's a good thing
example: asking an expert an obscure problem and they can jump right to the answer, that's a good use of filters
bad use of filters: Ferguson shooting, quickly devolved into us vs. them, overzealous filtering, no critical thought
changing gears, SoA and APIs and contracts are foundations for scaling
shared experience & history affect the usage of interfaces
example: if an elevator is at floor 5 of 9, and I'm on floor 1, should I press Up or Down? we would press Up, of course, but someone who has never seen an elevator may push Down because they want to tell the elevator to come down to level 1
if you put a Spanish chef in a Japanese sushi kitchen, he's going to figure out how to use the tools & ingredients around him to make his own style of food
code serves you, the machine, and whoever ends up reading it down the line (including your future self)
teams have interfaces too, and those interfaces can fail
managers are very important to a team's interface
leadership: McAfee, Ballmer, etc.; bosses vs. leaders, leaders are the face of your company, but they face inward too
leaders are the interface between the employees on the front line and the vision of the larger team & company
understand & honor all of the contracts you deal with
let others know when contracts change
always think of the users
Modern APM in Complex Environments
Greg Unrein, Principal Product Manager, New Relic
What is a complex environment?
deployment environments (IaaS, PaaS, bare metal)
types of systems (new, old, legacy, monolithic, SoA)
maps the response time distribution into three buckets
differnet apps and even different transactions inside of an app have different thresholds for Satisfied / Tolerating / Frustrated
Apdex = (satisfied + tolerating/2) / total
Storytime:
recent feature in NR: tagging NR apps by team / environment / product / etc.
this allows you to filter & roll up across multiple apps when you have a large number of them
another nice feature: how are your different apps connected? we have different views that show you the interrelations between your different services & components
example: one teams' deploy broke a different team's downstream service, both teams were able to quickly figure out what went wrong, and the deploy was rolled back
the speed of detection & resolution was due to a bunch of NR features flowing together and working in unison
Behind the Lens: The GoPro Story
CJ Prober, SVP of Software & Services, GoPro
started off by showing a cool video
I help run our cloud, web, mobile, and desktop software teams, as well as CRM, data science, etc.
going to talk about three things:
the GoPro model
our User Generated Content (UGC) network
the future
GoPro was founded in 2002 in San Mateo, we now have over 800 employees, and have the #1 selling camcorder in the U.S.
successful IPO last summer, but we were not an overnight success
ths vision: enable the expression and celebration of human passion
passionate users + versatile camera = the best UGC network
people tag or title their videos on YouTube with "GoPro", we don't prompt them to do that
this enables a virtuous cycle & viral growth, users advertise for us
if you compare our # of youtube views vs. # of sales, you see them growing at the same rate
UGC:
but UGC isn't completely user driven, there are different levels
Raw UGC -> Curated UGC -> GoPro original productions
example of raw UGC: we recently released our "Fetch" mount for dogs, 5 days later this video was posted on YouTube and got 14 million views: https://www.youtube.com/watch?v=UowkIRSDHfs
when we notice this, we reach out to the user, we offer them equipment, we offer a stipend, we provide editing, and we promote the video to boost its viral effect
another aspect of UGC is athelete sponsored content, we sponsor certain athletes and people enjoy watching their content, whether it's sports-related or not (e.g. family videos, personal videos, etc.)
we don't promote all content though, as you can imagine people film all kinds of stuff with GoPros (vulgarity, injuries, hunting animals, etc.)
Looking ahead:
everyone wants to share their videos, but not everyone does it
editing is a problem, even getting data out of the camera can be a problem for people
2 hours of video can take up 32GB
that's a lot of data, a lot of editing, a lot of time spent uploading
so we're focusing on eliminating painpoints in managing & editing content
software is the fastest growing part of GoPro
we're hiring!
one last video: the launch video for our latest product, Hero 4
first shout out to the NR employees who created our badges, and btw, what better way to see how the badges are working than using NR Insights? (demoed stats like # of stacks, # of activations, voltage, temperature, leaderboard of who had the most stacks, etc.)
how can software change the world?
we asked NR employees, here's a video
give back, contribute to open source, volunteer, using our skills as software engineers, give back to the community, go beyond corporate America, think really hard about your impact, how much power you have, build things, make the world a better place, software has the power to change the world for better
so, we played a part in fixing healthcare.gov, what about the rest of the world?
what about the developing world? technology can make a difference
first, I'd like to talk about Cure.org
they provide corrective surgeries for deformities, burns, neurological conditions, and cosmetic problems
welcome to the stage Joel Worrall, CTO of Cure.org
Cure.org
we heal kids in 30 countries around the world
we run clinics in these countries to help these kids
my job is to help build the technology to manage that
we can connect donors or just anyone interested to these kids, so they can donate or send messages of support, we provide updates so people can see the work we're doing, that we're actually helping people
story about travelling to Kenya, showing the Kenyan medical record system which is all paper based, lugging around storage containers of papers to hospitals and villages
we develop systems that can work offline, because you don't always have a reliable internet connection
we also launched an open source project called hospitalrun.io that we built, but hope anyone can use to run a hospital
it's cloud based, built on modern technologies: node, couch, ember, offline/online
also focused on usability, easy to use, intuitive
better software = more time & resources spent helping kids
Lew: how do you make sure the software is working?
we only have 4 developers, so it would be impossible to manage all this without new relic
Insights has been especially useful, we use it for things like tracking engagement
we not only push data into Insights and view it in NR, we pull it back out of Insights and feed it back into our site (for trending items, etc.)
Watsi
next, I'd like to talk about Watsi
so while Cure.org operates its own clinics and hospitals, Watsi approached the problem from a different angle, they promote and channel donations to different organizations which then provide the care
let's welcome Chase Adam, founder of Watsi
Chase: we got started 3 years ago, I was in the Peace Corp., and I saw a woman on a bus asking everyone for donations for a medical procedure for her son, and I noticed everyone was chipping in, so I wondered why they trusted her? it's because she had her son's medical record in a red folder and passed it around
I thought, why doesn't a service like this exist?
would you donate $1000 right now to healthcare, in general? probably no
would you donate $1000 right now to the person next to you if he was going to die without it? probably yes
we let you directly fund people who need help, tell their story through photos & updates
all of our data & records are publicly available
we use new relic to understand how our software is working
is our site up? is it fast? is all of our functionality working? can volunteers & doctors upload data & photos & updates into our system?
we collect $X in donations, and our goal is to just scale that out to 10x and 100x, and it is possible now using technology
New Relic for Non-Profits
we love working with organizations like Cure.org and Watsi, but we need to do more
it's time for us to get serious about non-profits
(aside: Lew just mentioned New Relic has 600 people)
please welcome Yvonne Wassenaar, SVP Operations @ NR, to tell us more
Yvonne: I was at Accenture & VMWare, and brought on to help New Relic scale
but also keep a focus on what's important
we already provide a lot of value to non-profits
we help them provide great software, and to provide them insights
when NR was small, this happened naturally, but as a company grows sometimes you lose sight of this
so starting in January of 2015, we are launching New Relic for Non-profits
we will provide 5 APM hosts for free for non-profits (is that good? I dunno, don't they have a free tier that already compare to that?)
Women & technology
another thing I want to change, can all the women in the room please stand up?
there's not enough of you
I have a daughter, so I'm starting to understand a bit about getting girls & women interested in technology
about a year ago, there was a lot of emotion about someone popular in the programming community saying something like "it's impossible to get women interested in coding", and that's probably not what he meant, but how should we respond to this?
I decided to teach my daughter and her three friends how to code, and they were super excited
I'd like to welcome Alannah Forster to the stage to tell us her story
I wrote a christmas card with code
and I was modifying that
my sister was interested too
and everyone was fighting over the code
Lew: you were using javascript, how did you learn?
I did Hour of Code as part of my English class in school
I learned on Khan Academy
(showed a live demo of the christmas card)
Lew: wow you have comments in here, you're better than me at this!
Lew: so then we started to modify this, we learned functions, loops
I learned about functions, so that was good
(showed a demo of a game she wrote, Doodle Jump clone)
Lew: what is @codegirlclub?
the Coding Clubhouse is a business I'm starting
Lew: let's all give Alannah a round of applause, this is the future generation of people who are going to write great software
so we've mentioned Code.org and Hour of Code a few times
we're honored to have Hadi Partovi join us to talk on the vision and where Code.org is headed
Computer Science: America's Untapped Opportunity
Hadi Partovi, Founder & CEO @ Code.org
I grew up in Iran, during the Iraq-Iran war
it was not a great environment for learning
but my dad was able to give my brother and I a Commodore 64 and a book on BASIC
I started there and eventually had a lot of success
the Job/Student Gap:
2% CS students compared to 98% for other fields
60% of new technical jobs are computing related vs. 40% for other math/science fields (did I get this right?)
CA has 78k open computing jobs, growing at 4x the national average
there were only 4.3k CS graduates in CA last year, the entire country was 40k students
exposure to CS in high school is the best way we curently have to get students into the field
only 300 out of 10k schools teach CS
there were fewer CS majors today than 10 years ago, and it's starting to catch up again, but it's not enough
also, women are a shrinking % of CS majors
AP enrollment by popularity: history, english, science, math, foreign languages, economics, art & music, then CS
and only a small sliver of those AP enrolled CS students are women or african american
CS is the best paying field, but has least number of enrolled students
when I started programming, it was about computers
now, computers are everywhere, tablets, cars, phones, etc.
67% of computing jobs are outside of the tech industry (banking, medicine, etc.)
the Hour of Code was our effort to make an "Earth Day" for programming
we got over 100 partners (Google, Facebook, Microsoft, etc.)
just one hour can open you up to a new world, break down sterotypes
44.5 million people have tried Hour of Code
it's also split pretty evenly between boys and girls, even a little higher for girls
reached 15 million students in just 5 days, the fastest service to reach 15 million users
where did students come from? was it the media that featured us?
no, it was 40,000 teachers
learning can feel like a game, we've seen it
online tutorials make teachers' lives easier
there is no IT hassle either, it's all over the web, nothing to install
students love fun & creativity, and we try to provide that
demo: flappy bird (code.org/flappy)
block programming interface, not text, drag & drop
there are 10 levels, progressively adding new concepts / requirements, but also letting you do some creative stuff (underwater flappy shark with different physics)
after I finish I can text the link to my phone and play right away
with our interface you learn the basics of conditions, loops, etc.
you learn the underlying concepts that carry over to all languages
don't get caught up in syntax
"Hour of Code" = marketing and getting people interested
Code.org is much more than that, we provide full courses and curricula to schools
we're in 30 school districts for Grade 9-12
300 new classrooms, 13k students, full year courses, 34% girls, 60% black & hispanic
AP Comp Sci reaches 40k students, 20% girls, 18% black & hispanic
so we're already surpassing them in some ways, in our first year
for K-8 we're in 40k classrooms, 1.5 million students, 40% girls
we also host and support single-day workshops, free for K-5 teachers, lots of help & support, provide instructors (volunteers from the industry)
FAQs:
Q: if we struggle with math & reading, why spend precious time on coding?
A: it's just 1 hour, and kids are excited about doing this
Q: is this too hard for me or my students?
A: just try it, a 5th grader or even kindergartener can do our tutorials by themselves
Q: why does everyone need to learn how to "code"?
A: well.. we teach the concepts, we don't teach everyone to be a "coder", and this helps you in all fields, learning how to think in a structured way, and we also teach about how the internet works, how cybersecurity works, this is useful information for everyone
Q: isn't coding only for nerds?
A: we're trying to change that, cool is what we make it (celebrities, etc.)
Hour of Code 2014: Dec 8-14
we want to reach 100 million students
we also want to raise $5 million dollars, check our indiegogo campaign
we are cost effective because we're not doing all of the teaching, the teachers are doing the teaching, we just provide them really great resources
so we're incredibly efficient
visit Code.org, help us reach 100 million students!
to close: a video showing what we've done in the past year
my definition of performance: it works as expected, reliably, for everyone
The baseline:
no errors
it has to load
all resources load successfully
"As expected":
harder to measure, more subjective
speed
responseiveness
modern
using best practices
provides feedback to the user
"Reliably"
the site is up
all API endpoints are up
third party API endpoints are up (if that's not possible, time to switch)
users using your app don't see a difference between you and your vendors / third party dependencies
"For everyone":
all geographies
all devices
all browser versions
even for stupid users
your app isn't being used by people with brand new MBPs in silicon valley with high speed internet
it's being used by a guy with a 4 year old laptop on airplane wifi who likes clicking submit buttons 40 times
so build defensively
New Relic Browser is GA now, rolled out to all accounts as of yesterday morning
new deployment method available: copy and paste our snippet into any page, even single-page / static apps
Demo
the homepage is really good at telling you one thing: is there currently an issue?
AJAX: now URL centric, with smart grouping (e.g. /accounts//applications//recent_events)
drill into AJAX requests, specific throughput & performance breakdown
this includes third party endpoint requests, not just requests to your servers
use this to hold your third party vendors accountable to their SLA
our current error rate at NR: 2-3%, this is actually pretty good, we see some people turn this on and they have an error rate of 20%-40%
we sort script errors by "impact", how many pages & people experience this error
one thing you might notice: "Script error" appears multiple times, this happens because different browsers surface the errors in different ways
we built heuristics to group errors in a smart way, across browsers
we're also renaming pageviews to be URL centric, not with controller / method names
we found that it just fits what people expect, especially front-end developers
speed vs. experience: for speed you have aggregate numbers, for experience you can view individual session traces
"I have a problem" vs. "this is why the problem is happening": see why certain load times or events take longer than others
new metric: "Waiting on AJAX", this is the number you want to watch for user experience, it's when the on-load AJAX calls finish, when the current page should be fully functional
one interesting thing we saw in our own app: lots of chains of setInterval/setTimeout calls, busy-waiting for things to become available, this impacts the user experience, it can be inefficient
power features: customizable grouping / aggregation
standalone applications: for apps that don't use APM, just copy and paste our snippet into your template
other stuff: more analytics collected
browsers (how many IE8 users do I have? can I stop supporting IE8?)
georaphic areas (how is my app performing in the US vs. Asia?)
cloudflare integration, more integrations coming
Performance Hackathons: Trulia's Obsession With Speed and Scale
Chris Sessions, Director of Operations @ Trulia
Louis Bennett, Director of Engineering @ Trulia
we're going to talk about our culture, how we improve our site
how do we maintain a startup vibe as a we continue to grow?
anyone familiar with R. Westrum? he did a study on org. culture
culture = patterns for responding to problems
he found three categories of culture: 1. pathological, 2. bureacratic, 3. generative
pathological:
gaining and keeping power
low cooperation
messengers are shot
failure leads to scapegoating
bureaucratic:
rule-heavy
cooperation is isolated to teams
messengers are neglected
failure leads to justice (reprimanding those who don't follow rules)
generative:
performance oriented
high cooperation
messengers are trained
failure leads to inquiry
What's worked for us:
weekly release meetings, dev & ops
regular tech learning sessions, anyone with a good idea can present
lunch roulette, group outings, team building events, happy hours
innovation weeks (one per quarter) where you can work on anything
safe environment for taking risks (e.g. in post-mortem process)
data everywhere, no hoarding
regular scalability & performance hackathons
devops is a paradox, the best ops know dev, and the best devs know ops
dev = blue lens, ops = red lens, devops = seeing in 3D
the contribution is greatest when the two sides work together
new relic is a facilitator of dev & ops working together
Enter the hackathon:
we have performance monitoring on dashboards everywhere, but...
if performance is a top-line KPI (along with revenue, # of users, etc.)
so let's dedicate time to performance the same way we do for other things
spend a few hours, see what happens, and iterate from there
Take one:
9 developers showed up, everyone split up into pairs
each pair grabbed one slow transaction and dove in
this was great! more eyes on the code is good, and a few pull requests were made with improvements
but... we need ops in here, devs can't do it alone
this made it even more awesome, it felt like the old days of everyone huddled around a laptop
faster to find problems & make fixes
take two: devs (FE and BE) and ops
two rules: attendance is optional, but if you do show up, participation is mandatory
What we found:
this is a little embarrassing, but we're going to show them, because you might have similar problems
example 1: legacy code
one bit of legacy code retried a flaky lucene connection, 16 second response time, that was kinda acceptable years ago, but not anymore, just throw an error
example 2: server oddities
xfs on a web server? not the filesystem.. but the X Font Server..
so we removed that, turn off any extraneous services
How to do this:
first, we are privileged, we think we're close to a generative culture, we have very high trust, so if we send out a hackathon invite, we're going to get a lot of participants, also, new relic is not free
step 1: you need monitoring, you need APM (e.g. new relic)
step 2: set aside some time, we do 2 hours per week, and it's important that it's recurring
step 3: find an area to work on, e.g. key transactions, and do this step ahead of time, because it's easy to get lost in the weeds, there's so much you can look at
step 4: make it better! (research, hack, create PRs, follow up)
step 5: iterate, not every change has to be a big success, keep trying
what worked for us may not work with you
but we hope this is useful for you, dev + ops working together
Q&A:
Q: how do you guys use Insights? do you use it?
A: we don't use it that much right now, but are looking at it
Q: how often do you hold hackathons, and how do devs juggle their normal priorities vs. performance?
A: tuesday afternoon from 3pm to 5pm, we don't force anyone to go, devs can prioritize by themselves
Q: do you prioritize perf. outside of hackathons?
A: yes, umm new relic allows us to see performance, other dude: deadlines sometimes force you to launch something
Q: what other tools do you use besides new relic?
A: nagios for alerting, logstash & kibana, previously ran splunk, i can go on and on, but we keep trying new things, even multiple tools that solve the same problem at once
Q: how does the hackathon work with product specific dev teams?
A: it's pretty easy actually, it's great for knowledge transfer, it's great to get new eyes on an area of the code and start asking questions about why things work that way and how they can be better
Q: how do you convince the business that code quality is as important as features?
A: we weren't founded by an engineer, so it's a little difficult for us, we're in the housing industry, which has its own ideas of "architecture" (like of a building or house) and "debt" (mortgages), so there are some interesting parallels there
Q: how do you load test?
A: we do have some synthetic tools for load tests, but our service is crawled constantly, and we silo our applications and services, so we can serve most bot / scraper traffic
Q: how do you turn hackathon projects into live production code?
A: solve the smallest problem you can, solving small problems = low risk, make a proof of concept, this allows you to see the risk & complexity of changes
Q: what is your release cycle time, how do you integrate your performance changes?
A: different products have different cycles, we're moving towards SoA so each team has autonomy to deploy & rollback whenever, our flagship product is a weekly release, while mobile API has multiple releases per day
Q: do you have any non-devs in the hackathon?
A: oh for ops? well.. any ops people in the room? it can be a challenge sometimes.. there's always uptime & application support & stability, but we're going to get better over time, i hope people will see that this is actually pretty fun and a great way to make improvements and to collaborate with devs, maybe our perf hackathons will be expanded to things like stability hackathons
Data Driven Monitoring
Daniel Schauenberg, Infrastructure Toolsmith @ Etsy
over 30 million members
over 18 million items listed
LAMP stack, linux, mysql memcached, apache, PHP
some postgres, java, ruby, go; but etsy.com is mostly one big PHP app
120 web & api nodes
we deploy a lot, from 10 times per day in 2010 to 30-40 times per day in 2014
we split config deploys from code deploys to make deploys faster
how comfortable are you deploying right now?
on your first day you deploy the site, boot up a dev VM and deploy as soon as your laptop is set up
deployinator web app, one click to deploy to staging, then one click to deploy to prod
ruby app, runs rsync, has a log streamer, deploy log, etc.
dashboards: the most important actions and metrics for users on the site
we use deploy markers
Ganglia:
system level metrics
one instance per DC/environment
220k RRD files
fully configured through chef role attributes
Statsd:
used for application metrics
one instance
Graphite:
96GB RAM, 20 cores, 7.3TB SSD RAID 10
525k metrics/minute
mirrored setup using carbon-relay
7 relays for sharding
1-2 minutes for failover via DNS, that's good enough for us
if you graph your graphite stats in graphite, then if there's a problem you can't debug
so we send graphite stats to ganglia
Nagios:
for alerting
we love nagios, it works really well for us
2 instances per data center, fully configured by chef
service checks and contacts in git
notifications via email -> SMS gateway
~75% of checks go to ops on-call
Nagdash:
aggregates results across multiple instances
2000 nodes, 30k service checks
IRC integration:
everyone is on IRC
"?nag" commands in IRC can query check/host status, set downtime, etc.
everyone complains about the nagios UI, but if you use the API you can write whatever interface works best for you
using chat helps communicate everything to everyone
More:
syslog-ng
logstash
logster
supergrep (tail -f ... | grep ... across all servers)
eventinator (records events across the entire infrastructure, chef runs, config changes, etc.)
Information overload:
leads to alert fatigue
"nagios spring cleaning" - going through your checks and figuring out what's not useful anymore
but.. we have data, we can make it better
so we wrote nagios-herald
it injects context into nagios's email alerts
colors
graphs
links to more information
Ops weekly:
we want more visibility into alerts
opsweekly will keep track of all alerts / pages during each on-call period
whoever was on-call can categorize the alerts (actionable, non-actionable, etc.)
integrates with fitbit for sleep tracking
use this to turn non-critical paging alerts to email alerts (i.e. it can wait until the next day)
open source: github.com/etsy/opsweekly
Summary:
set of trusted tools for monitoring
always experiment
always learn
always improve
use data for all the decisions you make
Q&A:
Q: what's more important, code quality or new features?
A: there's no absolute decision to make, new features = make money, badly written code = costs money, there's a spectrum and you have to find the spot that makes the most sense for your company's current state
Q: has anyone broken the site on their first day?
A: yes
Q: do you have dev vs. production parity problems?
A: since it's a monolithic repo and we merge into master, not really, also, since we use VMs for development, you're always running on a VM that mirrors the production servers
Q: how do you know what went wrong? how do you diagnose?
A: look at graphs, usually there's a group of people looking at the graphs, including the person who is deploying, and it's pretty easy to spot weird patterns, if you see something weird start asking around, or if nobody knows what's happening, page the on-call person (and there are different on-call people for payments vs. something vs. something else)
Q: how often do you rollback?
A: we don't rollback, but we do deploy revert commits; we do schema changes once per week and all migrations are required to be backwards compatible
Q: migrations / schema changes?
A: we use primary-primary & replicated shards, every thursday the DBAs will look at the migration tickets and roll out the changes
Q: why do you open source vs. keeping it in-house?
A: we are built on open source tools, we benefit from open source a lot, and we want to give back, so we open source as much as we can, A/B testing framework, deployment stuff, monitoring stuff, all developers think "can i open source this? what changes do i need to make?"
Q: what events go into eventinator?
A: DNS changes, cookbook changes, nagios changes, hadoop, network, firewall; all of these are stored in elasticsearch
Daniel Sturman, Engineering Director, Google Cloud Platform
Q: what are you roles?
A: Engates: get in front of customers, help customers get to the cloud, figure out our roadmap
A: Sturman: haeds the team that manages all our computation, GCE, App Engine, and internal platforms for search, gmail, etc.; if someone wants to kick off 5000 containers, internal or external, our team handles that
Q: the problems you both deal with are at large scale, what are some of the problems you deal with?
A: Sturman: a lot of servers, we instrument everything, machine/kernel/container level, how do you turn all that noise into value? look for more effective ways do we use this data? in our SRE team, we have very strict principles on what not to use data for, how to do incident response, both broad symptom detection and deep root cause analysis
Q: how do you do that? internal tools?
A: Sturman: it's all our own stuff, there was nothing good at our scale, but one of our goals is to open up our tools to others, share this technology with others
Q: how about you Engates?
A: we collect that smae kind of data, but more interesting is the data our customers collect and share with us, we help you build strategies for effective use of data, we support customers all the way up and down the stack, we don't write code for them, but we can help them architect; for example our mongoDB cluster, lots of companies are using our mongo DB cluster, I think it's the largest mongo cluster in the world; not everyone can be google, it's hard to do that in-house
Q: how do people build stuff on top of things that are in your hands?
A: Sturman: well we offer PaaS (GAE) and IaaS (GCE), so you can focus where to put your effort, e.g. for GAE, people don't want to handle scale or worry about ops things, customers want to dial in on what level they focus on, the savings from cloud are different for each company
A: Engates: the cloud has always been a cost savings thing, but it's really a time savings thing, your developers can focus on the business and innovating, not on infrastructure, we help you scale your business, we have a DevOps Automation Service that lets you treat infrastructure as code, so companies can focus on being more competitive
Q: let's get to the actual question of this session, what is the next thing in cloud computing? 1 year, 2 years, 3 years out?
A: Engates: containers are a big focus right now, our developers are working on docker, deploying OpenStack via docker, making docker work with OpenStack, working with the docker maintainers
A: Sturman: internally we no longer user VMs, externally we do use them for security reasons, but containers are a beautiful solution to resource problem, we want to enable a 4 person startup to be able to scale up, we are going to offer Mangaed Runtimes, currently in alpha, give us an arbitrary containers and we'll run & scale it, coming in 2015
Q: what do you spent most of your operational day thinking about? internal vs. customers?
A: Sturman: first thing I worry about is customer experience, internally you can be rough around the edges, externally no, we need to have great SLAs & uptime, our customers can't go down, big focus on reliability, we also focus on support if necessary, that's what I worry about the most, our teams are really good, our systems are stable, Google doesn't go down very much.. being a cloud provider is new to us, but we know how to do this stuff
A: Engates: similar for us, we have a good track record as a hosting company, every day we're on calls with customers, customers are an extension of our organization, we make use of & become familiar with the tools & technologies that our customers are using, we're built on open source, we want a familiar experience with our customers, New Relic is a great example, we use it ourselves and recommend it to our customers
A: Sturman: well for us it's about sharing the tools we built with our customers, pivoting into making them externally accessible
Q: what is the big thing customers are asking for, now?
A: Engates: timely, but one thing we just released, that customers were asking for, is integration & support for Google Apps (email), not only providing it but offering support
A: Sturman: privacy & security is a big issue, it's something we spend a lot of time on, how do you manage privacy on PaaS and IaaS? how do we protect against attacks? we do a great job internally, how can we extend that to our customers?
Q: let's talk more about security, bad guys are able to spin up instances on your services, how do you protect your assets? prevent bad things from happening?
A: Engates: it's a shared responsibility at all layers: data center, network, code, third parties; it's a community effort in a lot of ways, so we make sure to harden all those layers, we do 3rd party security audits, we work with customers, we release products that help customers with security, we offer private clouds
A: Sturman: we have a very large security team, internal & external, if anyone can start using our platform, that's a concern; we focus on platform security, keeping OS up to date at all times, seamless migration, also focus on defending against threats, unauthorized access, XSS, etc., we're building tools to help customers with that, we don't want security to be a person-intensive area, we want to build tools to gain leverage
Q: priorities for the rest of 2014?
A: Engates: make sure our ecommerce customers have a successful Black Friday and holiday season, we're going to almost do an.. infrastructure freeze.. to focus on stability, more rigorous procedures, similar processes for security, custom support, etc.
A: Sturman: onboarding more customers, helping them with their scaling challenges, Cloud Platform Live event in November, look forward to that, can't announce anything yet
Crowd Q&A:
Q: emphasis on containers, how do you advise customers on running their containers? thin containers vs. fat containers with ssh, syslog, etc.
A: Engates: well I would defer that question to our team, of course we want people to build horizontally scalable components, same thing for containers, but that's what our DevOps team does every day, helping customers re-architect
A: Sturman: we internally built Kubernetes, a platform for managing containers at scale, and we're opening that platform to the community, there are some first-class design patterns that Kubernetes provides, i.e. microservices/SoA, those are sort of built-in, so that's where Kubernetes is going and we think that will help
What Developers Should Do With Data
Poornima Vijayashanker, Founder, Femgineer
when you don't have any data, you don't have any idea what's going on, but once you start collecting data on everything you're similarly confused because there's too much to look at
not enough data -> noisy data -> too much data -> secure data
in the future: we're gonna be successful and have Big Data
when you're starting: there's no data in the system yet..
we've got to make our application compelling so users use it, to generate that data
customers are going to ask you: why should I trust you with my data? (email address, financial info, personal info, etc.)
you answer that question by building trust with the user
customer testimonials, social proof, partners, good design
use analogies, e.g. for mint.com: "Bank-level security"
convey trust at every level, design matters, UX matters
make it frictionless once the user takes their first step
make the onboarding frictionless
first impressions matter
use tools like mixpanel to see where customers drop off
don't be afraid to redesign to make things simpler
once users start returning, you need to delight them
continue adding value
for mint.com: telling users when they were charged a fee, because everyone hates fees, we want to alert users when this happens, make it feel like we're on their side
one way to show users what value you provide is a simple "How it works" page
show users the flow, the process
airbnb.com does a good job of this
this helps convince people to sign up
Noisy data:
data streams, third-party data, user actions
how do we deal with the noise? manage all the data?
first, parse it, hopefully you do this early
next, aggregate it & mash it up
take user data, anonymize it, and present it back to users
e.g. for mint: spending on bars vs. restaurants, rent in SF vs. Austin
this is good for marketing, makes for good story-telling, infographics
Making sense of the data:
analytics: Google Analytics, Mixpanel, KissMetrics
find out what's important, what's not?
logs & databases
hosting: use data to determine when & how to scale
use New Relic to find bottlenecks
add caching, use a different DB technology that fits your data better
warehousing: TeraData, Hadoop, etc.
What's next?
so now you have a good handle on your data, what's next?
how do you be proactive?
first, privacy
what are your policies? how do you implement them? how do you integrate with third parties?
next, security
not all users practice good security (e.g. losing their phone, using weak passwords), so we need to protect them from themselves
three levels: user security, employee security, and outsider security
responsible disclosure: asking hackers to hack your sites and notify you if they find anything, giving you a window of time to fix it
one security breach can ruin a startup, so a lot of startups are adopting this practice
want to learn more? read my book: How to Tranform Your Ideas Into Software Products
Q: how do you handle trust after a security breach?
A: don't cover it up, that's the worst thing to do, because when it does leak you'll lose even more trust, give a full explanation and post-mortem, people want to know why, they want to know what exactly was breached (e.g. are your passwords stored in plaintext), so take responsibility and tell them how you're going to prevent this in the future
Q: what tools are you using to collect your data?
A: customer data: mixpanel & kissmetrics, application data: new relic
Q: at the early stages at mint, how did you do growth and retention?
A: lots of press, 600 interviews, for retention: continue to delight users, keep giving them more, continue to delight users and get them to come back, for inactive users: re-engage them and let them know what's changed, show users how you can help them
New Kids on The Block: Three Startups at the Forefront of Disruption
Moderator: Dan Scholnick, General Partner Trinity Ventures & Board Member of NR
Kevin Klein, VP of Engineering, BloomThat
Artur Bergman, Founder and CEO, Fastly
Chris Gooley, Founder, Preact
Q&A:
Q: Kevin, your twitter handle is "the cheese is", why?
A: because it's amazing! you can put so many words at the end of that sentence
Q: what do you guys do?
A: BloomThat: flower delivery in 90 minutes anywhere in the Bay Area
A: Fastly: next gen CDN, cache anything, not just static stuff like CSS, JS, images, but also things that you normally would think of caching, like API calls
A: Preact: user analytics in real-time, so you can preact before you react
Q: what stage are you at?
A: BloomThat: less than a year old, still very early, a lot of work done by hand
A: Fastly: Series C, 3 years old, 130 people, 50 engineers
A: Preact: 2.5 years now, Series A last year, 18 people, just moved to SF
Hmm.. small, medium, and large startups.. so, Q: you are all technical leaders in your companies, how do you decide what to use? e.g. for hosting?
A: Kevin: nothing local, all cloud, in Rackspace, tried and true LAMP, PHP, all of our , don't think about our stack too much
A: Preact: started off fully co-located, since I was comfortable with sysadmin, but we re-evaluated AWS as IO and RAM got cheaper, so we're in 2 DCs now, but moving to AWS for certain tasks
A: Fastly: we're mostly self hosted, the CDN is entirely our hardware, we do some control plane stuff in the cloud, each CDN server does 40Gbit/s in traffic, each server runs dozens of VMs, which you can't do in the cloud, so it's a lot of custom low level stuff so we can use our machines with higher efficiency
Q: but someone from AWS would argue that you can run your kind of service on AWS, do you disagree with that?
A: Fastly: yes, I disagree, we use our own switches for example, we tweak the linux kernel & drivers, we customize our NICs
Q: but wouldn't they say that our cost decreases faster than you can make low-level efficiency improvements?
A: Fastly: even if they get better, there's no way to do certain things, like sub-10ms failover between servers? you can't do it in AWS, you have to be in the data center at the switch level, but maybe they'll figure it out eventually, that'd be great
Q: Kevin, what problems have you had with the cloud?
A: BloomThat: random outages, but they're always getting better
Q: Gooley, what about your hybrid setup?
A: Preact: we have a pretty predictable workload, so we know where to use AWS, there are some more costs involved for on-premise, e.g. paying for employees full-time in the DC, but we found that cost savings for our configuration was an order of magnitude cheaper, even if we change hardware every year
Q: going up the stack, what tools do you use?
A: Preact: best tool for the job, and those tools change over time, e.g. we started off on Mongo and became frightened once we reached scale, rapid development is great, but we moved to Cassandra to lock down our data store, another thing: we're huge users of redis to do a lot of our analytics (counters, etc.), and the atomic data structures & operations it provides, and speed and reliability it provides, are great for us
A: Fastly: well for databases, we don't have any huge requirements, our databases are tiny, just configuration data, and for our frontends we use a heavily modified version of Varnish
Q: philosophically, how would you compare your approach to Preact?
A: Fastly: best tool for the job is great, but once you reach scale its tough, as you grow people & products you can't have everyone be an expert in so many different technologies, so we limit our technology choices, for example web API = ruby, automation = python, CDN layer = C, etc.
Q: BloomThat, what do you think? who's right?
A: well there is no "right", only right for your company, since we're early we try to use what works and what we're comfortable with
Q: Preact, what do you use for programming language?
A: web = Ruby, data = Python and more recently Scala; if your engineers are good they can learn anything, we don't add new things to the stack just because though.. we do have a lot of git repos we're trying to manage, merge projects together
Q: is Fastly different because of your type of business and type of customer?
A: well if we go down it is catastrophic for our customers, yes
Q: Bloomthat, how do you handle faiures?
A: Bloomthat: it would be a disaster if we failed on Valentine's day, so we have plans for failover to different DCs, databases, etc.
Q: how do you test failover? seems tricky..
A: Bloomthat: you gotta do it in production, take your failover procedure and actually test it out in production
Q: what big problems are you working on right now?
A: BloomThat: right now we're trying to make our admin system as efficient as possible for our employees, if we don't do that then we decrease our "flower power per hour", so we iterate fast and try to make it as quick and easy to use as possible
A: Fastly: we have ops problems at a scale that few companies see, not many people reach 40 Gbit/s on a single machine, we see kernel bugs, we see low level bugs, so in that way it's also a hiring problem, not many people know how to solve these
A: Preact: we work with a lot of data, it's hard to parallelize & process in a distributed way, we're starting to use Spark a lot
Q: any other challenges? and solutions?
A: BloomThat: we're not trying to do anything to outside of the box, just execute really well on our problems
A: Fastly: we started using Go a year ago, and it's turned out to be the best language for distributed systems, another thing is our choice of switches, we were an early adopter in software defined networking, we use Arista switches and built our own SDN on top of that, we didn't go with commercially available SDNs because they couldn't handle the traffic we have
Q: hiring is a huge problem in the Bay Area, you three are all Bay companies, how do you deal with that?
A: BloomThat: we look for people who not only know the technologies we use, but people who can bring something new, we're also trying to bring on more Jr people, more generalists that can grow into specialists, generalists are easier to find, training is a problem since you're strapped for time in a startup, so most of it is just live-fire training, get them in front of real problems
A: Fastly: we try to hire people who we know through the open source community, bringing on open source leaders makes it easier to attract good talent, it's also easier to evaluate people who have a history in open source, follow-up Q from moderator: "is funding useful for hiring?", A: after a big funding announcement it's actually bad for the company, lots more noise coming in from recruiters and the wrong type of candidates, so no, it's not good
A: Preact: we've doubled in size in the past year, and every hire has been a referral, so that's an alternative to hiring people in the open source community, we also hired two people that this is their first programming job, and it's been amazing watching them grow, but you need to find people who really want to grow and have that passion, follow-up Q from moderator: "what's the diff. between SoCal and the Bay?", A: harder to find people in SoCal (didn't really catch what he said)
Q: how do you deal with poaching?
A: Fastly: not a problem for us, we've lost extremely few engineers, we're pretty lucky in that regard, we don't worry about it, but now that you mention it... i'm probably going to start worrying
A: Preact: culture & novely are really important for attracting & retaining people, do you really want to spend late nights with these people? is this problem worth solving? that's what I looked for when I joined Preact
Q: from the audience, "PHP, why?!"
A: BloomThat: we're using modern PHP, not ancient PHP, we chose PHP because the community is just amazing, and we love CodeIgniter
Dataclysm: Who We Are (When We Think No One's Looking)
Christian Rudder, Co-Founder, OkCupid
also the writer of the OkTrends blog
turned the blog into a book, Dataclysm
now the President of OkCupid
OkCupid was bought by Match.com in 2011
I'm going to talk to you about how we turned a community dating site into data science
this was 2003, pre-Big Data, pre-Facebook
dating is a weird business
if your site sucks, people go away, but if your site is great, people also go away, because they've found someone and no longer need your service
it's also weird to try to effect things in the real world, lots of real world problems, lots of human nature problems
our original idea was to apply math to dating
but how?
we decided to ask personality questions
ask a ton of questions, some good, some bad, some important, some unimportant
try to ask divisive questions, "are you a murderer?" is a bad question because everyone answers no
apply algorithms to these answers
but.. no matter what data you put on the page, users always click on the most attractive photos
the most attractive people get too many messages, leave the site
the second hottest people get doubled down on, and feel creeped out
average users don't get any messages, and get discouraged
we found that we want people to connect with 4-8s, not all 10s
so we built our own version of HotOrNot, ripped them off, but with 5 stars instead of 10 stars
how guys vote on women: pretty standard looking bell curve peaking at 2.8 out of 5
we've heard that advertising and airbrushing and models have warped mens' perception of attractiveness of women, but our data doesn't show that
how women vote on guys: HUGE spike at 1-2, men are basically rated in the 1-2.5 range, chopped in half
are all our men ugly? no, women just rate men very harshly, their average is redefined
why? in straight relationships, men are the pursuers, women are the gatekeepers, so women have the ability to be more selective
so this threw off our initial model of matching people up, we need to percentalize mens' ratings vs. womens' ratings
we also look at age
women rate mens' age in a diagonal pattern, women think men that are the same age are the most attractive
men rate women very differently, they rate 20 as the most attractive age, no matter what age they themselves are
note that this is just QuickMatch data (attractiveness), for actual sending of messages the trend for men is more diagonal (message people that are the same age)
we also thought about when to show attractive people to users on the site
if you show all the hot people up front, the person is very engaged at first, but drops off very quickly
if you show not so attractive people up front, they get disinterested
should you show 20 year olds to 40 year olds?
we try not to make too many prescriptive choices like this, we wanted to be the anti-eHarmony, more open
we optimized on message/reply rate to start
but a lot of messages are "not interested" or "haha", they're not significant
so we start measuring length of conversations, e.g. 4-way chains of messages (you, me, you, me)
so we were doing this kind of thing to start, thinking the sky is the limit, we'll do this for a couple of years and figure it all out
the math is not hard
the hard part is human nature
the math needs to intersect human nature, you need both sides
a lot of people ask me how to do data science
should I learn X or Y, use this database, that tool
the most important thing is to understand what's actually happening, understand users & their motivation
we used SQL and excel to start, and that worked for us
we had a long runway because we were self-funded, and we've done a pretty good job at this point
so that's my lesson, get inside peoples' heads and hearts
Future of Modern Software
back to Lew Cirne
this conference is called FutureStack, but what is the "Future Stack"?
I knew Solomon Hykes since he was CTO of dotCloud, which was one of the first PaaS's that let you mix and match components (language, framework, database, etc.)
he executed one of the greatest pivots I've seen in software, moving out of the PaaS business and open sourcing the technology behind dotCloud
so with that, Solomon
Building the future stack: Docker
Solomon Hykes, Docker
docker is an open source project
it's grown faster than we ever thought possible
1000 pull requests in its first year, up to 8000 now
Docker is not something you build your app with, its one layer below that
why did docker become so popular, so quickly?
we were in the right place at the right time, when something important was happening
Where is my app running?
the answer to this question was, for a long time: "that machine over there"
today: mobile & cloud, distributed applications
distributed is what people expect now, apps need to be native to the internet and to the device
your app needs to run on any device, it needs to be elastic, it needs to work offline & online
those kinds of applications are hard to build
there's a tooling problem
the "platform" is always expanding
we need independence between the service and the platform
but... for the past few years, we've made it work, somehow
if you are a tech company with a lot of resources, like Google, you've already solved these problems, a long time ago
large tech shops have already solved this problem, but everyone else hasn't
we're cobbling together open source tools and doing the best we can
we don't yet have a cohesive answer to "where is my app running?"
so that's what we wanted to build
Docker
docker is a toolkit for building distributed applications
it's a set of tools, but as a whole it's greater than the sum of the parts
Packaging & distribution
the goal of packaging & distribution is to be everywhere and nowhere
there's an explosion of hardware, OS's, running different versions
so we defined a standardized package format, based on LXC
Sandboxed runtimes
this goes hand-in-hand with packaging & distribution
we need a secure way to run applications without affecting other applictions on the system
think of the App Store, you can install new apps, and new apps can't break your existing apps
Networking
how do you get these components to talk to each other?
we need consistent network technology between all these components
Clustering & composition
related problems
once you have these components, how do you assemble them together into one whole? and scale out?
Identity
an urgent problem in distributed systems
related to security, reliability, control, auditing
what's running where, who built it, is it supported, for each component
cryptographic proof of identity and ownership
Authorization
goes with identity
once you know who or what something is, you can decide what level of access to give it
authorization is a layer that connects components together based on identity
how do we make this easy for developers?
how do we make this compatible with organizations & enterprises that already have a way of doing this stuff?
The Docker Way
open always wins
we can't do this on our own
even large companies like Google can't create a perfect solution to this problem
this is a problem on the scale of the entire internet
this is a problem not suited to a single commercial entity
it requires a diverse set of engineers to work together and come to agreement, which is hard
when we built dotCloud, it was successful, but it was clear that we couldn't solve the mega-problem
Docker's success was a happy accident, that we got so many experts, duking it out, designing these solutions
it's really a snowball effect at this point, something big is happening, and we're part of it
solve one problem at a time
there's a tendency to solve this type of problem in a monolithic way, someone writes one big ball of mud to solve the entire problem, and everyone's going to use it and it's going to be great
this doesn't work in the real world
everyone has different requirements
so we need loosely coupled systems
scale by composition
this is our guiding principle on scale
combine existing solutions to produce novel solutions to hard problems
enforce standard interfaces
we can't predict the future, this stuff is changing fast
we don't know what hardware and software will come along
how do you accomodate this?
we want to avoid having people rip apart or rewrite their entire stack just because something better came along
so we come up with standard interfaces that have broad agreement
Docker is a set of open, simple tools that each solve one problem, that you can compose together, that communicate through standard interfaces
we're applying Unix design principles to solve the problem of distributed applications
40 years after Unix, the same techniques, on a different problem
please try docker, come say hi in IRC, or find me later
Q&A:
Q: what was it like to pivot from dotCloud to Docker? to totally change the business model? how did you present that to the board?
A: when we started off with dotCloud we made it clear we weren't going to go after the short-term, we asked our investors & board for patience, to trust our hunch, notice how the 4 points I mentioned earlier are all false for dotCloud, because it was monolithic and closed source, so we felt that this wasn't right, we built the first versions of Docker alongside dotCloud, started to get early feedback, and it really just took off from there, so there was no other choice but to pivot
Closing
grand prize winner of the badge contest is..... (trip to any conference anywhere in the world)
also, GoPro winners
also, Alannah, who can not only code, but network as well, she's getting a GoPro as well
this conference is about people, taking pride in your software, making the people that use your software have a joyful and productive experience
thank you New Relic marketing team
closing thoughts: moments matter, thank you for investing 2 days in coming here, I hope its been worthwhile, I can't wait until next year, we look forward to seeing you at FutureStack15, thank you