Skip to content

Instantly share code, notes, and snippets.

@toddgrotenhuis
Last active September 16, 2020 18:55
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save toddgrotenhuis/902c96120bc94929e11e53571c42bc1f to your computer and use it in GitHub Desktop.
Save toddgrotenhuis/902c96120bc94929e11e53571c42bc1f to your computer and use it in GitHub Desktop.
Security Thinking for Big Data

Security Objectives

  • Understand and reduce manageable risks
  • Prepare for problems and quickly recover from harm
  • Adapt our practices based on the changing context

Project Qualities

Dangerous

  • Obfuscation & opacity
  • Scales poorly
  • Causes harm
  • Dangerous feedback loops

Better

  • Clarity & accountability
  • Designed for sustainability
  • Mitigates harm & offers redress
  • Adapts based on results & context

Security Questions to Ask of Your Big Data Project

Assess the Risk

  • What are our objectives, priorities, and assumptions?
  • What data & metadata do we have?
  • How sensitive is the data? How dangerous is the data? Is it replaceable?
  • How valuable is the data? To who?
  • What legal standards apply?
  • What rate of false positives are acceptable? False negatives?
  • What can’t we measure? What are we leaving out?
  • What tradeoffs are we making?

Modeling Threats

  • Have we codified biases, injustices, or faulty assumptions into our model?
  • Is there noise, malicious activity, or false information in our training data?
  • Are we using obfuscation to hide sloppy data, processes, or proxies?
  • Who will want access to the data? What will they try to get it?
  • How does our system fail? Safe? Secure? Fair?
  • Who or what makes the final decision in the model? Is there a safety lever?
  • Are the incentives we are creating aligned with our objectives?
  • Will abuse be possible for those who gain insider knowledge?

Implementing Defenses

  • How are we providing routine maintenance, updates, and cleanup?
  • How are we separating, storing, and transmitting information?
  • How do we determine permissions? Is it granular enough?
  • Are we generating logs & alerts to detect failures and misuse?
  • Are we routinely testing our models and systems for reliability and safety?
  • Do we have an end-of-life process for our systems and data?
  • Do we have a challenge, redress, and/or opt-out process?

Auditing Your Project

  • Can we validate how a specific decision was made?
  • Can we validate groupings, classifications, segmentation, etc.?
  • Can we validate our assumptions? Expected outcomes? Predictions?
  • What unintended consequences have we observed?
  • Has this created any perverse incentives? Feedback loops? Echo chambers?
  • How are people attempting to game the system? Are we catching them?
  • Have we used any proxies to get around legal challenges?
  • Is this funded at an appropriate level to keep it safe?

Reminder: Adapt based on the auditing results, and repeat the cycle for changes.

Additional Material

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment