Skip to content

Instantly share code, notes, and snippets.

@szeitlin
Last active March 3, 2020 00:44
Show Gist options
  • Save szeitlin/24bd39216ec0a763a3280ccb1b39dacf to your computer and use it in GitHub Desktop.
Save szeitlin/24bd39216ec0a763a3280ccb1b39dacf to your computer and use it in GitHub Desktop.
Women in Data Science at LLNL 3/2/2020

First speaker: Marisol

did outreach at UC Merced - workforce development; cybersecurity CECORE course based on a Stanford class

She did 20 years in CS; satellite image analysis.

Originally from a rural area of New Mexico. Was invited to join a 5-person team for a Los Alamos challenge when she was age 15. Said it took her a year to learn how to connect using a modem, but once she was in she realized she could make a computer do anything she wanted.

Mentoring:

  1. Ignite the spark
  2. Encourage each other
  3. Share knowledge

Women's empowerment lunch every 2 weeks


Daphne Koller - In Sitro, Stanford

Eroom's Law: there's an exponential decrease in pharmaceutical R&D productivity takes about $2.5B and 15 years per approved new drug

Binary classification: is there a bear here?

Multiclass classification: what is in this image?

Example of how models tend to latch onto subtle artifacts: study to detect broken bones in x-ray images ended up learning differences between the x-ray machines once they corrected for that, the ability to detect fractures was actually no better than a coin toss


Panel on ethics: FAITHE Fairness, Accountability, Integrity, Transparency, Honesty, Equity

Compounding risks: black box solutions + biased data + biased humans Don't just want virtue-signalling

Big data as a source of power inequity Women are especially aware since we've been on the receiving end of inequity for so long

What's public, what's private, and who decides


Alyson Fox - Skynet

Collaborative Autonomy in energy program (edge computing)

Cyber and infrastructure resilience to protect the power grid

Want reliable computing from unreliable hardware, like in multicellular biology (idea that losing one agent in a cluster doesn't take the whole thing down)

Decentralized averaging: each agent has a piece of data

Want techniques that are robust to:

  • device failure
  • network failure
  • calculation error
  • source data problems

Who does the calculation (which agent in the cluster)?

Can the method be network-agnostic, or does it need to know the topology of the graph?

Broadcast back: how does the device know when it has all the data and it's time to calculate?

Ex: Push-Sum Consensus, see Olshevsky et al. 2018

1.Each agent has a value and a weight 2.Wait and then communicate 3.Update the estimate

Each node is k physical agents - all the machine on the node are checking each other's calculations

Need historical context and domain context

example of a geometic mean with arcsin that is robust to outliers

Creating robust, iterative linear solvers

ex. Synchronous Jacobi: has to wait for all the data for each update

Actually want something asynchronous, e.g. Asynchronous Redundant Jacobi

testing: 30 agents where 1 is 10x slower than all the others

  • too much redundancy slows things down, so there's a sweet spot

Fanny Chevalier - U Toronto

Decision making!

ex. 16 shark attacks per year vs. 1730 vending machine attacks per year

Mental shortcuts:

  • affect (emotion)
  • assumptions
  • anecdotes
  • Expectations
  • Perspective

Interactive visualization tools

see Doccurate: Sultanum et al. VAST 2018 Phenolines.org: Glueck et al. VAST 2017

ex. Florida Dept. Law Enforcement charts with the y-axis upside-down re: stand your ground law, but misleading

see Ritchie et al. CHI 2019

  1. Make the data relatable ex. use soccer fields, not hectares see Climate Change Coloring Book

instead of 39g sugar in a can of coke, that's 10 sugar cubes

  1. Engage with data see Dear Data book Data Ink- personal visualization tool, collaboration with MSFT Xia et al. CHI 2018 https://www.microsoft.com/en-us/research/uploads/prod/2018/05/dataink.pdf

Data Quilt Zhang et al. CHI 2018

  1. Teach Data C'est la vis - tool for kids, see Alper et al.

Katie Schmidt - LLNL

Sensitivity Analysis

UQ - Uncertainty Quantification

  • her thesis advisor wrote one of the key textbooks on the topic

connection to Model Calibration

  • assume error is normally distributed to start with

  • seems to be describing Frequentist as using "fixed parameters" (as if we don't update models??)

big vs. small change in output when we change a parameter

when doing dimensional reduction, if parameters are non-identifiable, the optimal parameter set may not be unique (there may be more than one set that is optimal)

local sensitivity: do partial derivatives

global sensitivity: Sobol Decomposition

decompose the variance to get Sobol indices: Si

and interaction sensitivities: S1-2, S2-3 etc.

total sensitivity: add up all the components


Kelli Humbird - Design Physicist at LLNL

Inertial Confinement Fusion (ICF)

shoot a laser at a 2mm capsule of deuterium and tritium

traditionally start with simulations because experiments are $$$

ML trained on a mapping of inputs and outputs, so use NNs w/transfer learning

30k low-fidelity sims --> low fidelity NN --> add 23 high fidelity sims and do a high-fidelity NN --> add 23 experiments and do your final NN

low fidelity here == simpler physics models (fewer parameters, less computationally expensive)

looking to maximize e.g. yield x (squared area density)

  1. naive: high power laser, 1D assumptions, just blast it
  2. slightly better: account for laser degradation at high power, use a thinner capsule and lower power
  3. after transfer learning: use a longer pulse at lower power, account for 3D effects

see github.com/llnl/djinn & I downloaded her IEEE paper.


Tsu-Jae King Liu, dean of engineering at Berkeley (she's an EE)

** her slides were an especially good, though depressing overview of current statistics on women in tech

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment