Security meetup @ Pagerduty 2014-07-11
http://www.meetup.com/PagerDuty-DevOps-Meetup/events/189658332/
http://www.slideshare.net/yandex/defending-the-bird-justin-collins-alex-smolen-twitter
"Defend the bird!"
- when you think of security, do you think of locks, or of freedom?
- is security about blocking people, or is it about enabling people?
- I think it's about enabling people
- but if that's the case, why do we even need to worry about security?
Growth:
- twitter has had a lot of growth, # of employees and users
- tech stack has changed: rails -> distributed SOA
- we have a lot of frontends now (mobile, web, ads, acquired companies & services)
- we need security to cover everything
Problems:
- for example, high profile accounts get breached
- @AP news breach -> fake terrorism report -> stock market dips
- twitter received an FTC consent decree due to Obama's account being breached
- the consent decree requires twitter to implement and document certain security practices for the next 20 years
- those are some pretty gnarly requirements
- and only ~1 out of every 100 engineers are dedicated to security
- how do we manage this?
How we manage security:
-
first, security is not an umbrella term: it can't expand to take up vaguely security related tasks like spam fighting, bots, monitoring, etc.
-
look for force multipliers: automation, code review, security features
-
automate security: avoid tedious tasks, catch issues early, notify the right people ASAP (the people who can actually fix the problem, the developers)
-
Hack Week project: centralized security system
- analysis (static analysis tools)
- security metrics
- notify developers who have affected code
-
Brakeman: static analysis tool for rails
- demo: checkout a new branch, introduce a new vulnerability (unprotected redirect), commit & push, developer gets automated notification via email of the vuln. right away
- developers are notified ASAP, instant feedback
- that's an example of automation
-
Other tools:
- CoffeeBreak - brakeman for JS (XSS checks, etc.)
- PhantomGang - phantomJS fuzzer type thing (?), simulates random user activity on your site (clicking stuff, filling out forms, following links?)
- all of these tools "report to SADB" (I think SADB is their centralized security thing)
- commercial tools haven't really helped us too much, but they can help you catch regressions (known bad practices)
-
Manual code review
- this is a very hard task
- only focus on doing reviews that are flagged by devs as being security sensitive
- some parts of review can be automated (e.g. homegrown static analysis / code checkers, automatic flagging for review on sensitive areas of the codebase)
- keep a dashboard of pending reviews, try to do a clean sweep every month
-
Design
- teams can file self-service requests for security design review
- come up with a standard process for these reviews
- security (like everything) is easier & cheaper to do in dev vs. fixing it in production
-
Multi-factor Auth
- we used SMS to start, since we already have a lot of SMS integration, send users a 6 digit code
- later, we released native Android/iOS MFA apps
- native apps use priv/pub key pairs to sign login requests
- twitter is never in possession of your private key (unlike other MFA systems that rely on a shared secret)
- give users backup codes
- MFA is hard, the UX in particular is tricky
- and how do you handle e.g. @CNN, a shared account?
- it's hard!
-
HTTPS
- twitter was one of the first large scale sites to go 100% HTTPS, as of two years ago
- CRIME, BREACH, heartbleed, etc. are very scary, but not having HTTPS enabled is even worse
- we pushed for HSTS in browsers, which forces the browser to use the HTTPS version of the site (for MITM SSL stripping attacks)
- we use certificate pinning in all of our clients & native apps (to prevent unauthorized certificates from compromised CAs)
- we use PFS, perfect forward secrecy
-
this is a ruby gem that will automatically wrap best practice security headers around all HTTP responses
-
keybird - secrets for environments (using puppet)
-
main takeaway: build tools!
We are a small startup, but we have a big product with a focus on security.
- there are so many problems in security, and not enough time to handle them all
- but you need to start on security ASAP, the earlier you get security practices in place, the better
- come up with policies: general security policies for code review, etc.
Secure defaults:
- file permissions, daemons, config files, logs, application data
- almost nothing needs to be world readable or writable
- run 100% of your daemons as unpriviliged users, no daemons as root
- make your secure defaults hard to disable, hard to circumvent
- developers should not need to be making changes to security settings all the time
Encryption:
-
encrypt all traffic by default
-
assume a hostile environment, inside & outside the data center
-
it's less effort to encrypt everything at the transport layer (VPN, IPSEC, VPC) vs. doing everything on a protocol-by-protocol basis
-
sanitize data going outside of your DC (e.g. to SaaS services)
-
use FDE (disks get re-used, especially at a shared hosting provider)
-
centralized enforcement via code, but distributed to the node level (???)
Firewall:
- we have a chef resource for controlling firewall rules
- it allows us to specify which hosts/roles/environments & ports/services can talk to each other
- not every host needs to talk to every other host on your network!
- SOA helps
Monitoring:
- automated checks should verify your security rules are working (e.g. firewall rules)
- your monitoring / instrumentation should act as the "attacker", verify that outsider attackers can't get in
- http://gauntlt.org/
- we use IPSEC, so we can calculate a ratio of IPSEC vs. non-IPSEC traffic and look for any big jumps in insecure communications
- we also look at # of ISAKMP states (not sure what that is or why that's good)
Logging, IPS, IDS:
- logs must be shipped, you cannot depend on the logs being secure when they are stored on the machine itself
- syslog is great
- once you have centralized logging, you can start alerting on strange log occurences (foreign IPs, failed attempts, user creation, suspicious patterns)
- "Active Response" = poor man's IPS
- use HIDS systems: store & compare files to known good checksums, search logs, etc.
- again, set up alerting for all of these and take action when alerts come in
- Fail2Ban is a great active response system for logins, use it! but don't stop there..
What's next for us?
-
Automated Abuse Reports?
-
STONITH for hosts with security alerts? (since we have centralized logging & redundancy, we can do offline analysis, the host doesn't need to be up)
-
we have some additional writing on security on the pagerduty blog
We've heard a lot about ops today, but what about sales & marketing?
- what about the rest of the enterprise?
- they need security too
Old school IT:
- everything is on-premise
- applications, HR, AD/LDAP, SAP, etc.
- then came SaaS
- SSO helps with SaaS
- then came automated provisioning (e.g. sales guys get access to all sales SaaS apps)
- self-service workflow
- what about remote / WFH workers?
- they need VPN, RSA keychains, access via RADIUS
- what about partners? what about service providers? identity providers?
Today:
- this was the landscape up until about 2010 or so
- today we have all of the old stuff
- plus more people working remotely
- entire pieces of infrastructure and entire applications are being moved to IaaS/PaaS/SaaS
- documents are no longer stored on a file server, they're in Google Docs, Dropbox, Box, DocuSign, Office365
- we have things like Salesforce
- and most recently: proliferation of mobile devices, personal & company issued
How do we deal with all this?
-
look at the Identify Lifecycle, employee hire -> exit
-
HR -> IT -> AD/LDAP -> Apps/SaaS/etc. -> Devices
-
what happens between each of those steps?
-
MFA, auditing, security, roles, deparments, fine-grained auth, HR/IT systems must be in sync with each other
-
user experience is important
-
but having systems in place for security, auditing, compliance, change management, etc. are essential (e.g. required by law)
-
also, we want to throw all this information into a BI system for analytics (determining efficiency, usage, etc.)
-
all of these goals probably match up with the goals of your applications, too
-
e.g. user management, SSO, authorization, provisioning, auditing & compliance
-
but users have a different mindset vs. employees vs. partners
-
(I don't understand what he means here.. is he saying you can also use Okta for your user-facing web app? e.g. user-facing sites like Twitter/reddit/Pets.com could have their account system powered by Okta? like Stormpath?)
-
for external users, everything has to be JIT / automated, you can't manually provision that many users
-
social sign on (FB, Google, etc.)
-
concerns for user-facing apps: branding, scale, performance, security, future proofing (new devices, new services, new auth. methods)
-
this makes up an Identity Platform
-
once you have a few hundred users you need to build this kind of system
-
the old way will not scale
- Q: If we want to make security part of our culture, how do you do that? How do you communicate this? Educate people?
- A: (twitter dude) no antagonistic relationships, e.g. saying "No" vs. "How can we help you get this done", for example in our automated security reports (Brakeman, etc.) there is a large "Bullshit!" button that people can use to report false positives, this builds trust & communication between the two sides, if you build that trust then people will naturally start coming to you
- Q: will these talks be available online?
- A: (pagerduty person) yes, see our blog
- Q: something something about heartbleed
- A: (pagerdude) we weren't affected except for one internal system, but we fixed it right away
- Q: did you guys do key rotation?
- A: (pagerdude) yes, we did that, manually