Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active February 14, 2021 23:51
Show Gist options
  • Save misho-kr/e894bb33acc8c6d727ebe64ddc6f3187 to your computer and use it in GitHub Desktop.
Save misho-kr/e894bb33acc8c6d727ebe64ddc6f3187 to your computer and use it in GitHub Desktop.
Summary of "How Google does Machine Learning" from Coursera.Org

What is machine learning, and what kinds of problems can it solve? Google thinks about machine learning slightly differently -- of being about logic, rather than just data. We talk about why such a framing is useful for data scientists when thinking about building a pipeline of machine learning models.

Then, we discuss the five phases of converting a candidate use case to be driven by machine learning, and consider why it is important the phases not be skipped. We end with a recognition of the biases that machine learning can amplify and how to recognize this.

Intro to Specialization

  • In 5 years -- 2012-17 Google has built and deployed over 4000 ML models
  • Google Clould offers great tools and services for deploying ML models to prodiction
  • Goals
    • ML with TensorFlow
    • Improving ML accuracy
    • ML at scale
    • Specialized ML models

Refs: Graffiti Artist Classifier, Pose-Estimator with Move Mirror

Labs and demos: Training Data Analyst

What it means to be AI-first

  • Artificial Intelligence is a discipline; machine learning is a specific way of solving AI problems
  • In ML, machines learn. They don’t start out intelligent, become intelligent.
  • Train an ML model with examples, then predict with a trained model
  • Neural networks is one important technology we use
  • ML replaces heuristics, it converts examples into knowledge
  • Many ML projects fail because of training-serving skew

How Google does ML

  • Google infuses ML into almost all its product
  • The ML surprise
    • Defining KPI’s
    • Collecting data
    • Building infrastructure
    • Optimizing ML algorithm
    • Integration
  • Path to ML: The 5 phases
    • Individual contributor
    • Delegation
    • Digitization
    • Big Data and Analytics
    • Machine learning

Inclusive ML

  • Machine learning and human bias
  • The confusion matrix leads to evaluation metric insights
    • True positives
    • False positives -- Type I error
    • False negatives -- Type II Error
    • True negatives
  • Sometimes false negatives are better than false positives
  • The Equality of Opportunity approach strives to give individuals an equal chance of desired outcome
    • Simulating decisions with no constraints can lead to unequal distribution
    • Simulating decisions with group unaware holds everyone to the same standard, which can be unfair to some groups
  • How to find errors in your dataset using Facets
    • Gives users a quick understanding of the distribution of values across features of their datasets

Python notebooks in the cloud

  • AI Platform Notebooks (formerly Cloud Datalab) are a fully hosted version of the popular JupyterLab notebook environment
  • Compute Engine and Cloud Storage
    • Customizable machine types and flexible compute options
    • Control latency and availability with zones and regions
$ datalab create my-datalab-vm --machine-type n1-highmem-8 --zone us-central1-a

Lab: Geographic data in Datalab

  • Analyzing data using AI Platform Notebooks and BigQuery

Lab: Data Analysis using Datalab and BigQuery, (notebook)

query="""
SELECT
  departure_delay,
  COUNT(1) AS num_flights,
  APPROX_QUANTILES(arrival_delay, 10) AS arrival_delay_deciles
FROM
  `bigquery-samples.airline_ontime_data.flights`
GROUP BY
  departure_delay
HAVING
  num_flights > 100
ORDER BY
  departure_delay ASC
"""

from google.cloud import bigquery
df = bigquery.Client().query(query).to_dataframe()
df.head()

import pandas as pd
percentiles = df['arrival_delay_deciles'].apply(pd.Series)
percentiles = percentiles.rename(columns = lambda x : str(x*10) + "%")
df = pd.concat([df['departure_delay'], percentiles], axis=1)
df.head()

without_extremes = df.drop(['0%', '100%'], 1)
without_extremes.plot(x='departure_delay', xlim=(-30,50), ylim=(-50,50));
  • Machine Learning APIs
    • Cloud Vision - Complex image detection with a simple REST request
    • Cloud Video Intelligence - Understands your video entities by shot, frame or video level
    • Cloud Speech - Speech to text transcription in 100+ languages
    • Cloud Translation - Translate text into 100+ languages
    • Cloud Natural Language - Understand text with a simple REST API request

Lab: Invoking Machine Learning APIs, (notebook)

Summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment