Skip to content

Instantly share code, notes, and snippets.

@jaceklaskowski
Last active August 7, 2017 06:29
Show Gist options
  • Save jaceklaskowski/09958c126f638f2c8efd76020861c477 to your computer and use it in GitHub Desktop.
Save jaceklaskowski/09958c126f638f2c8efd76020861c477 to your computer and use it in GitHub Desktop.
Notes about TensorFlow (before settling on Apache BEAM and Databricks' TensorFrames)

TensorFlow

What is TensorFlow?

  • Google's TensorFlow is a open source Deep Learning neural network machine learning library
    • Grew out of Google's DistBelief v2 = Google's Brain project
  • Building a system that simplifies deployment of large-scale machine learning models to a variety of hardware (thousands of servers in datacenters, smartphones, GPUs).
  • Much like Theano - a popular deep learning framework.
  • Data Flow Graph (aka Computational Graph or TensorFlow Graph of Computation) with nodes for data or operations and edges for flow of data between nodes called tensor.
  • Tensor is a multi-dimentional array that flows between nodes.
    • Tensor has a rank.
  • No hyperparameter configuration supported so use Keras: Deep Learning library for Theano and TensorFlow
  • C++ and Python APIs
    • Other languages developed by the community using SWIG -- a software development tool that connects programs written in C and C++ with a variety of high-level programming languages.
  • Distributed TensorFlow
    • Data Parallelism
    • Model Parallelism
  • OpenCL - a standard for GPU computing
    • CUDA GPU supported already
  • TensorBoard - a visualisation tool for network design
    • tensorboard command-line tool to view a graph

TensorFlow Computation Model

TensorFlow and Apache BEAM and Databricks' TensorFrames

TODO

Topics to Explore

  • Dataset API and Queues
  • TensorBoard - Visualisation of Learning
  • TensorFlow Parallelism
  • Building Image Classifier
  • TensorFlow's Use Cases - Where to Use TensorFlow?

Recurrent Neural Networks (RNNs)

A neural network is a function that learns the expected output for a given input from training datasets.

A bias represents the threshold to determine whether or not a neuron is activated by the inputs.

An artificial neuron just classifies a data point into one of two kinds by examining input values with weights and bias.

You let the computer determine the parameters (weights and bias) by learning from training datasets.

"Two" big challenges for neural networks:

  • training deep neural networks requires a lot of computation power
  • training requires large training data sets
  • A lot of trial and error to get the best training results with many combinations of different network designs and algorithms.

Examples

Building a neural network that recognizes images of a cat, you train the network with a lot of sample cat images.

Building a neural network that outputs which users have a high probability of conversion, you train the network with the input being a bunch of user activity logs from gaming servers.

Training single neuron to classify a set of images as "images of number 8" or "other images."

TensorFlow on Spark

  • The TensorFlow library can be installed on Spark clusters as a regular Python library.
  • The model is distributed to the workers of the clusters, as Spark’s broadcast variable.

References

Examples

Apache Beam

  1. Introduction to Apache Beam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment