toddgrotenhuis/Security Thinking for Big Data.md

## Security Thinking for Big Data.md

      
    Raw
  

              Security Thinking for Big Data.md
            
          
    Security Objectives


Understand and reduce manageable risks
Prepare for problems and quickly recover from harm
Adapt our practices based on the changing context

Project Qualities

Dangerous


Obfuscation & opacity
Scales poorly
Causes harm
Dangerous feedback loops

Better


Clarity & accountability
Designed for sustainability
Mitigates harm & offers redress
Adapts based on results & context

Security Questions to Ask of Your Big Data Project

Assess the Risk


What are our objectives, priorities, and assumptions?
What data & metadata do we have?
How sensitive is the data? How dangerous is the data? Is it replaceable?
How valuable is the data? To who?
What legal standards apply?
What rate of false positives are acceptable? False negatives?
What can’t we measure? What are we leaving out?
What tradeoffs are we making?

Modeling Threats


Have we codified biases, injustices, or faulty assumptions into our model?
Is there noise, malicious activity, or false information in our training data?
Are we using obfuscation to hide sloppy data, processes, or proxies?
Who will want access to the data? What will they try to get it?
How does our system fail? Safe? Secure? Fair?
Who or what makes the final decision in the model? Is there a safety lever?
Are the incentives we are creating aligned with our objectives?
Will abuse be possible for those who gain insider knowledge?

Implementing Defenses


How are we providing routine maintenance, updates, and cleanup?
How are we separating, storing, and transmitting information?
How do we determine permissions? Is it granular enough?
Are we generating logs & alerts to detect failures and misuse?
Are we routinely testing our models and systems for reliability and safety?
Do we have an end-of-life process for our systems and data?
Do we have a challenge, redress, and/or opt-out process?

Auditing Your Project


Can we validate how a specific decision was made?
Can we validate groupings, classifications, segmentation, etc.?
Can we validate our assumptions? Expected outcomes? Predictions?
What unintended consequences have we observed?
Has this created any perverse incentives? Feedback loops? Echo chambers?
How are people attempting to game the system? Are we catching them?
Have we used any proxies to get around legal challenges?
Is this funded at an appropriate level to keep it safe?

Reminder: Adapt based on the auditing results, and repeat the cycle for changes.
Additional Material


Algorithmic Justice League
"Your Data Is Being Manipulated"
Threat Modeling: Designing for Security
Haunted by Data
Weapons of Math Destruction
The Financial Modelers' Manifesto/Modelers' Hippocratic Oath
I Am the Cavalry
AI Security Resources
Stop Data Mining Me: Opt-Out List
How Adversarial Attacks Work
The Problem with Building a "Fair" System
Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible? A: Because Keynote Speakers Make Bad Life Decisions and Are Poor Role Models
An Ethics Checklist for Data Scientists
A Practical Way to Include an Ethics Review in Your Development Processes
Ethical Explorer Pack