Skip to content

Instantly share code, notes, and snippets.

@omochi
Created July 10, 2019 15:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save omochi/4ddc82cd94d3e1ea1c5cd348ba7733da to your computer and use it in GitHub Desktop.
Save omochi/4ddc82cd94d3e1ea1c5cd348ba7733da to your computer and use it in GitHub Desktop.

slidenumber: true autoscale: true

Swift for TensorFlow is a completely new paradigm for machine learning

by @omochimetaru from Qoncept, Inc

Swift for TensorFlow meetup #1


About me

  • omochimetaru @ twitter inline 20%

  • iOS software developer at Qoncept, Inc

  • Swift compiler learner (see Speaker Deck)

    • Compile time computation in Swift
    • ABI Stability and Library Evolution
    • Introduction of Opaque Result Type
    • Implementation of Opaque Result Type
    • Runtime representation of function in Swift
    • Implementation of Generics and Protocol in swift
    • Swift runtime library and Value Witness Table
    • Metatype of Generic types
    • Exploring generics in swift from compiler source
    • Debugging of swiftc
    • IRGen for Type Metadata
    • Mangling and Substitution in Swift

About our company

  • Qoncept, Inc. inline 100%

  • Research in computer vision

  • Develop mobile apps



Sorry 🙏

  • I am not an expert of ML at all

  • But I can share my knowledge about S4TF

  • Give me advice if you can during my talk


What is ML

(I talk only about the current popular approach)


Data points

inline


inline


Decide function

inline

  • Two parameters: α, β

Start with random initial parameters

inline


How to fix errors?

  • Differentiation tells us relationship between parameter and result

Differentiation of function

inline


Each point has a gradient

inline


inline


inline


inline

  • Current: α = 0, β = 0.5
  • Update: α += 0.5, β += 0.5

inline


inline

  • Current: α = 0.5, β = 1.0
  • Update: α += 0.5, β = keep

inline


Wrap up

  • Decide shape of function

  • Adjust parameters of function bit by bit

    • Using differentiation of function

What is Deep Learning


Considering a more realistic task

  • What kind of shape does this function have?

inline


Neural Network1

inline


  • A lot of computation (mulitplication, addition, ReLU, etc...)
  • Many parameters
  • So it can express various complex functions
  • Make it have many layers → Deep

What is needed for ML


  • Programming, for using computer and automation

  • Way to express a symbolic representation of function

  • Differentiation of function

  • A library that does all the rest


Automatic differentiation

  • Differentiation can be derived automatically from function definition.

  • Remember the process to differentiate that you learned at school in the past (chain rule).

  • It is a set of primitive mutation of expression and the recursive application of it.

  • Finally, the problem becomes only about how to express the function in code.


There are many such libraries in python world


Problem

How to express function?


Two expression style

  • Define and run

    • Explicit graph building
    • TensorFlow 1.X
  • Define by run

    • Eager execution
    • TensorFlow 2.X

TensorFlow 1.X 2

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Define and run

[.code-highlight: 1-7]

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Defines computation here.


Define and run

[.code-highlight: 8-14]

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

This loop performs estimation process for parameters.


Technique in implementation

tf.matmul(x, W) + b
  • This code seems to do calculation, but doesn't.
  • This code builds computation structure which represents this expression itself.
  • It is similar to AST in compiler.

Imitated example by Swift

class Node {}

class MatMulNode : Node {
    let lhs: Node
    let rhs: Node
}

class AddNode : Node {
    let lhs: Node
    let rhs: Node
}

func matmul(_ a: Node, _ b: Node) -> Node {
    return MatMulNode(a, b)
}

func +(_ a: Node, _ b: Node) -> Node {
    return AddNode(a, b)
}

Explicit graph building style

  • User code means graph building which represents this expression itself.

  • Library can know complete computation structure from this graph.


  • ✅ Good performance by optimization.

  • ❌ Confusing. Normal code and meta code are mixed.

  • ❌ Learning execution process is hidden behind runtime library except the outermost loop. It is hard to know intermediate state of values in these steps.


TensorFlow 2.X 3

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => hello, [[4.]]
  • Actual computation is executed ordinarily.

Define by run

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)
  • GradientTape scope tells library to watch and record computation executed in it.
  • Execution has side effects which define computation graph.

More example 4

class Model(object):
  def __init__(self):
    # Initialize variable to (5.0, 0.0)
    # In practice, these should be initialized to random values.
    self.W = tf.Variable(5.0)
    self.b = tf.Variable(0.0)

  def __call__(self, x):
    return self.W * x + self.b

model = Model()

def loss(predicted_y, desired_y):
  return tf.reduce_mean(tf.square(predicted_y - desired_y))

def train(model, inputs, outputs, learning_rate):
  with tf.GradientTape() as t:
    current_loss = loss(model(inputs), outputs)
  dW, db = t.gradient(current_loss, [model.W, model.b])
  model.W.assign_sub(learning_rate * dW)
  model.b.assign_sub(learning_rate * db)

model = Model()

# Collect the history of W-values and b-values to plot later
Ws, bs = [], []
epochs = range(10)
for epoch in epochs:
  Ws.append(model.W.numpy())
  bs.append(model.b.numpy())
  current_loss = loss(model(inputs), outputs)

  train(model, inputs, outputs, learning_rate=0.1)
  print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
        (epoch, Ws[-1], bs[-1], current_loss))

Eager execution style

  • User code just means compute expression like ordinary programming.

  • Library sees computation structure during execution.


  • ✅ Easy to understand what happens.

  • ✅ Execution process is controled same as ordinary program. It is easy to see intermediate values.

  • ❌ Optimization is limited.

  • ❌ Overhead from host language execution steps.


Tradeoff

  • Define and run

    • ✅ Performance
    • ❌ Usability
  • Define by run

    • ✅ Usability
    • ❌ Perfomance

Can this be solved? 🤔


Rethink

tf.matmul(x, W) + b
  • Problem is whether this code express structure of computation or just execution of it.

  • Graph building is similar to AST in compiler.


We are Swift programmer


In our point of view

tf.matmul(x, W) + b
  • This code is recognized as AST in compiler on compile time. AST is exactly graph of program over computation.

  • This code executes this expression on runtime.


Compiler support can solve problem


inline


Swift for TensorFlow 5

for epoch in 1...epochCount {
    var epochLoss: Float = 0
    var epochAccuracy: Float = 0
    var batchCount: Int = 0
    for batch in trainDataset {
        let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
            let logits = model(batch.features)
            return softmaxCrossEntropy(logits: logits, labels: batch.labels)
        }
        optimizer.update(&model.allDifferentiableVariables, along: grad)
        
        let logits = model(batch.features)
        epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels)
        epochLoss += loss.scalarized()
        batchCount += 1
    }
    epochAccuracy /= Float(batchCount)
    epochLoss /= Float(batchCount)
    trainAccuracyResults.append(epochAccuracy)
    trainLossResults.append(epochLoss)
    if epoch % 50 == 0 {
        print("Epoch \(epoch): Loss: \(epochLoss), Accuracy: \(epochAccuracy)")
    }
}

let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
    let logits = model(batch.features)
    return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}

extension Differenciable {
    func valueWithGradient<R : Differentiable>(
        in f: @differentiable (Self) -> R
    ) -> (value: R, gradient: CotangentVector)
        where R : FloatingPoint, R.CotangentVector == R
}

(reference) 6


Completely new paradigm

  • ✅ Code means just ordinal computation.
  • ✅ Compiler provides differentiation of function at where it needs in compile time.
  • ✅ Compiler can optimize whole execution not only expression but including other control code.

It's a really natural and sensible approach.


Swift

Best language in the world today

  • Type inference provides safety and makes it easy to write.
  • Null safety which fixes the billion-dollar mistake.7
  • Compiler tells about mistakes such as typo to the programmer.
  • Less debugging time and more productivity.

New challenge for Swift

  • Swift is only used in iOS development actually. 😢

  • Swift for TF may spread out use to the massive ML world.


Why does almost everyone use python for ML now ?


  • Because python is current winner

  • Many researchers are already using python, so new researchers also use it to easily follow them.

  • Many papers and libraries exist and these are great assets.


  • 💡But language doesn't have particular superiority about ML.

  • People doesn't have reason to use python language, but want assets of python.

  • In short, they want numpy.


Python interoprability of Swift

let np = Python.import("numpy")
print(np)
// => <module 'numpy' from '/usr/local/lib/python3.6/
//      dist-packages/numpy/__init__.py'>

let zeros = np.ones([2, 3])
print(zeros)
// => [[1. 1. 1.]
//     [1. 1. 1.]]

  • It really runs python behind Swift.
  • It is syntactically Swift, but actually python.
  • It is very similar to the case when we use Objective-C class from Swift.

Lets grow Swift for TF

  • There are some companies which has ML section and Swift experts.

  • They are the best players to dive into whole new ML generation.

  • Of course, our company also has the potential.


I found the last missing part

  • Python libraries already exist that can be used from Swift.

  • But codes do not.

  • Users should want to copy and paste when they start working.

  • So Python to Python in Swift transpiler is needed.

  • I think Google might be developing it below the surface.


Summary

  • Swift for TF solves a big problem in ML library by a new approarch which is provided by compiler technology.

  • Swift for TF is prepared to replace Python for ML reseachers.

  • Swift for TF is also a big project for mainline Swift.

Footnotes

  1. https://jp.mathworks.com/discovery/neural-network.html

  2. https://qiita.com/uramonk/items/c207c948ccb6cd0a1346

  3. https://www.tensorflow.org/tutorials/eager/eager_basics

  4. https://www.tensorflow.org/tutorials/eager/custom_training

  5. https://www.tensorflow.org/swift/tutorials/model_training_walkthrough

  6. https://www.tensorflow.org/swift/api_docs/Protocols/Differentiable.html#valuewithgradientin:

  7. https://en.wikipedia.org/wiki/Tony_Hoare

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment