Skip to content

Instantly share code, notes, and snippets.

@kliph
Last active June 14, 2018 13:30
Show Gist options
  • Save kliph/de4edbe931f650a6c425de52fc77f493 to your computer and use it in GitHub Desktop.
Save kliph/de4edbe931f650a6c425de52fc77f493 to your computer and use it in GitHub Desktop.
Notes from #datajawn 2018

DataJawn <2018-06-13 Wed>

#datajawn on twitter

Sponsors

Drexel

CompassRed

Data Science Sense and Sensibility

@vboykis

Douglas Adams Silastic Amorfiends of Stiterax

“A story called potatoes”

Hyped data problems

Things we worry about but shouldn’t

  • The end of human decision making(!)
  • Outsourcing of jobs and technology
    Hype
    “We should stop training radiologists” - George Hinton
    Reality
    Radiology is still evolving. It’s not a solved problem.
  • Self-driving cars
    Hype
    what do we do with the trash that accumulates when people leave the car?
    Reality
    Training a self-driving car model takes a lot of data. Takes a long time.
    • So we use Captcha. Humans are in the loop.
  • Robots
    Hype
    Robots will kill us
    Reality
    artificial general intelligence is a long way away

Real data problems

  • People on the ground are worried about provisioning S3 buckets, not robot ethics, or linked data, or killer robots
Problem
you are a large brick and mortar book retailer trying to compete against Amazon
  • You Have a website built on premises and it doesn’t do recommendations
Solution
You heard Google uses deep learning to recommend Youtube videos. So use AI!
  • Hari Botter sidebar recommends books you may also enjoy on your website
  • A scalable cloud-based application that interfaces with your existing webapp …
  • New problems
    • Moving data to the cloud
      • Study cloud-native architectures
      • AWS Glue
    • Security and data privacy
      • Data breaches and GDPR
      • The less data you store, the easier it can be
      • Keep windows of history then send to cold storage
      Differential privacy
      adding noise to a sample of user data, tweaking the age or adding more parameters. Your model will still read through the noise.
      • Synthetic data creation: GANs
    • Model interprebility – Are your recommendations good?
      • The Barnes Foundation model algorithm led to a recommendation of paintings that were too similar
      • Self driving cars tried to classify object several times before deciding to stop. Hit a person that it thought was a bike or a stop sign
      • Use varied training data, get users to test and validate your data
      • Use simpler models that are easier to interpret
  • Migrating to the cloud will take at least 15 months
    • Citation needed

Data science in nonprofits

Sam Chenkin TechImpact

Nonprofits are a good application of data science

  • Rich datasets
  • Need to know how to apply limited resources

Small organizations trying to do large amounts of work

  • larger companies can afford to have data science on staff

How can data science help?

  • Prove value of technology
  • Understand impact
  • Avert crises

Case study

  • Documentation is only for compliance
  • No one trusts technology
  • Funders may require data collection but it’s not driven by nonprofit’s needs

Your job as an expert is to help people understand rather than to create tools or models

Accelerationg Drug Discovery through Data Science

Bonnie Kruft Glaxo Smith Klein

Divide into teams

  • Data Build
  • Data Use
  • Data Strategy
  • Index
    • Catalog where all the data is
  • Build
    • Infrastructure for storing, searching, and computing that data

Complex data ecosystem

  • Genomics are predicted to match or overtake data from any other domain in 2025

How do you measure value?

  • Easy to access data
  • Is the data usable
  • Can you make decisions
  • Do you save time and money

From Predictions to Decisions

Corey Chivers Penn Medicine @cjbayesian

A Data Scientist is a device for turning coffee and data into better decisions

ROC curves and Confusion matrices

Don’t tell you about whether you should use a model.

All models are wrong but some are useful

Often in a situation where you don’t know the outcome of events

Hard to test true negatives, false postives etc.

Predict probability of outcome then try to maximize estimated goodness of outcomes

Need to compare against alternatives

Evaluate the cost of potential outcomes

  • Treat everyone
  • Treat no one
  • Some other strategy

May decide to tune sensitivity and specificity

May decide to treat all people

Decision theory is what this approach is called

Earth observation data

Andrew Pawloski Element 84 small engineering firm

Earth data is a national asset

Billions of dollars in public funding

NASA data is free

Worldview

Dataimprint

Data visualization for social change

Philadelphia’s Clef Club

Ben Garvey

Quantifying Negadelphia

Hitchhiking robot

  • Destroyed in Philadelphia

Philly fans booed Santa

1776 tweetstorm

Is Philadelphia an inherently negative place?

Analyze geotagged tweets for sentiment from top most populated 13 cities

Randy Zwitch

Your clothes have thousands of threads your analytics should too

MapD

You should use GPUs for analytics

Useful for operations dashboard type stuff

Why HR should treat employees as customers

Bruce Marable EmployeeCycle

Treat Customer and Talent Acquisition as similar endeavors

  • Recruiting and Sales follow the same lifecycle

Net Promoter Score

This is how companies track how much customers love them

Pulse surveys

Track the pulse of how your customer/talent is feeling

Track churn and turnover data

Data Takers and Data Makers

How do we get insights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment