omochi/swift-for-tensorflow-talk.md

## swift-for-tensorflow-talk.md

      
    Raw
  

              swift-for-tensorflow-talk.md
            
          
    slidenumber: true
autoscale: true
Swift for TensorFlow is a completely new paradigm for machine learning

by @omochimetaru from Qoncept, Inc

Swift for TensorFlow meetup #1


About me


omochimetaru @ twitter 


iOS software developer at Qoncept, Inc


Swift compiler learner (see Speaker Deck)

Compile time computation in Swift
ABI Stability and Library Evolution
Introduction of Opaque Result Type
Implementation of Opaque Result Type
Runtime representation of function in Swift
Implementation of Generics and Protocol in swift
Swift runtime library and Value Witness Table
Metatype of Generic types
Exploring generics in swift from compiler source
Debugging of swiftc
IRGen for Type Metadata
Mangling and Substitution in Swift


About our company


Qoncept, Inc. 


Research in computer vision


Develop mobile apps


https://twitter.com/hayashi/status/1120571069000241152

Our reseacher build it with TensorFlow
Before ARKit 3


https://twitter.com/yp1_ydct/status/1108691148141297664

Search "アイスコープ" in twitter


more: https://qoncept.co.jp


Sorry 🙏


I am not an expert of ML at all


But I can share my knowledge about S4TF


Give me advice if you can during my talk


What is ML

(I talk only about the current popular approach)


Data points


Decide function


Two parameters: α, β


Start with random initial parameters


How to fix errors?


Differentiation tells us relationship between parameter and result


Differentiation of function


Each point has a gradient


Current: α = 0, β = 0.5
Update: α += 0.5, β += 0.5


Current: α = 0.5, β = 1.0
Update: α += 0.5, β = keep


Wrap up


Decide shape of function


Adjust parameters of function bit by bit

Using differentiation of function


What is Deep Learning


Considering a more realistic task


What kind of shape does this function have?


Neural Network¹


A lot of computation (mulitplication, addition, ReLU, etc...)
Many parameters
So it can express various complex functions
Make it have many layers → Deep


What is needed for ML


Programming, for using computer and automation


Way to express a symbolic representation of function


Differentiation of function


A library that does all the rest


Automatic differentiation


Differentiation can be derived automatically from function definition.


Remember the process to differentiate that you learned at school in the past (chain rule).


It is a set of primitive mutation of expression and the recursive application of it.


Finally, the problem becomes only about how to express the function in code.


There are many such libraries in python world


Problem

How to express function?


Two expression style


Define and run

Explicit graph building
TensorFlow 1.X


Define by run

Eager execution
TensorFlow 2.X


TensorFlow 1.X ²

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Define and run

[.code-highlight: 1-7]
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})
Defines computation here.

Define and run

[.code-highlight: 8-14]
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})
This loop performs estimation process for parameters.

Technique in implementation

tf.matmul(x, W) + b

This code seems to do calculation, but doesn't.
This code builds computation structure which represents this expression itself.
It is similar to AST in compiler.


Imitated example by Swift

class Node {}

class MatMulNode : Node {
    let lhs: Node
    let rhs: Node
}

class AddNode : Node {
    let lhs: Node
    let rhs: Node
}

func matmul(_ a: Node, _ b: Node) -> Node {
    return MatMulNode(a, b)
}

func +(_ a: Node, _ b: Node) -> Node {
    return AddNode(a, b)
}

Explicit graph building style


User code means graph building which represents this expression itself.


Library can know complete computation structure from this graph.


✅ Good performance by optimization.


❌ Confusing. Normal code and meta code are mixed.


❌ Learning execution process is hidden behind runtime library except the outermost loop.
It is hard to know intermediate state of values in these steps.


TensorFlow 2.X ³

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => hello, [[4.]]

Actual computation is executed ordinarily.


Define by run

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

GradientTape scope tells library to watch and record computation executed in it.
Execution has side effects which define computation graph.


More example ⁴

class Model(object):
  def __init__(self):
    # Initialize variable to (5.0, 0.0)
    # In practice, these should be initialized to random values.
    self.W = tf.Variable(5.0)
    self.b = tf.Variable(0.0)

  def __call__(self, x):
    return self.W * x + self.b

model = Model()

def loss(predicted_y, desired_y):
  return tf.reduce_mean(tf.square(predicted_y - desired_y))

def train(model, inputs, outputs, learning_rate):
  with tf.GradientTape() as t:
    current_loss = loss(model(inputs), outputs)
  dW, db = t.gradient(current_loss, [model.W, model.b])
  model.W.assign_sub(learning_rate * dW)
  model.b.assign_sub(learning_rate * db)

model = Model()

# Collect the history of W-values and b-values to plot later
Ws, bs = [], []
epochs = range(10)
for epoch in epochs:
  Ws.append(model.W.numpy())
  bs.append(model.b.numpy())
  current_loss = loss(model(inputs), outputs)

  train(model, inputs, outputs, learning_rate=0.1)
  print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
        (epoch, Ws[-1], bs[-1], current_loss))

Eager execution style


User code just means compute expression like ordinary programming.


Library sees computation structure during execution.


✅ Easy to understand what happens.


✅ Execution process is controled same as ordinary program.
It is easy to see intermediate values.


❌ Optimization is limited.


❌ Overhead from host language execution steps.


Tradeoff


Define and run

✅ Performance
❌ Usability


Define by run

✅ Usability
❌ Perfomance


Can this be solved? 🤔


Rethink

tf.matmul(x, W) + b


Problem is whether this code express structure of computation or just execution of it.


Graph building is similar to AST in compiler.


We are Swift programmer


In our point of view

tf.matmul(x, W) + b


This code is recognized as AST in compiler on compile time.
AST is exactly graph of program over computation.


This code executes this expression on runtime.


Compiler support can solve problem


Swift for TensorFlow ⁵

for epoch in 1...epochCount {
    var epochLoss: Float = 0
    var epochAccuracy: Float = 0
    var batchCount: Int = 0
    for batch in trainDataset {
        let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
            let logits = model(batch.features)
            return softmaxCrossEntropy(logits: logits, labels: batch.labels)
        }
        optimizer.update(&model.allDifferentiableVariables, along: grad)
        
        let logits = model(batch.features)
        epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels)
        epochLoss += loss.scalarized()
        batchCount += 1
    }
    epochAccuracy /= Float(batchCount)
    epochLoss /= Float(batchCount)
    trainAccuracyResults.append(epochAccuracy)
    trainLossResults.append(epochLoss)
    if epoch % 50 == 0 {
        print("Epoch \(epoch): Loss: \(epochLoss), Accuracy: \(epochAccuracy)")
    }
}

let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
    let logits = model(batch.features)
    return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}

extension Differenciable {
    func valueWithGradient<R : Differentiable>(
        in f: @differentiable (Self) -> R
    ) -> (value: R, gradient: CotangentVector)
        where R : FloatingPoint, R.CotangentVector == R
}
(reference) ⁶

Completely new paradigm


✅ Code means just ordinal computation.
✅ Compiler provides differentiation of function at where it needs in compile time.
✅ Compiler can optimize whole execution not only expression but including other control code.

It's a really natural and sensible approach.

Swift

Best language in the world today

Type inference provides safety and makes it easy to write.
Null safety which fixes the billion-dollar mistake.⁷
Compiler tells about mistakes such as typo to the programmer.
Less debugging time and more productivity.


New challenge for Swift


Swift is only used in iOS development actually. 😢


Swift for TF may spread out use to the massive ML world.


Why does almost everyone use python for ML now ?


Because python is current winner


Many researchers are already using python, so new researchers also use it to easily follow them.


Many papers and libraries exist and these are great assets.


💡But language doesn't have particular superiority about ML.


People doesn't have reason to use python language, but want assets of python.


In short, they want numpy.


Python interoprability of Swift

let np = Python.import("numpy")
print(np)
// => <module 'numpy' from '/usr/local/lib/python3.6/
//      dist-packages/numpy/__init__.py'>

let zeros = np.ones([2, 3])
print(zeros)
// => [[1. 1. 1.]
//     [1. 1. 1.]]


It really runs python behind Swift.
It is syntactically Swift, but actually python.
It is very similar to the case when we use Objective-C class from Swift.


Lets grow Swift for TF


There are some companies which has ML section and Swift experts.


They are the best players to dive into whole new ML generation.


Of course, our company also has the potential.


I found the last missing part


Python libraries already exist that can be used from Swift.


But codes do not.


Users should want to copy and paste when they start working.


So Python to Python in Swift transpiler is needed.


I think Google might be developing it below the surface.


Summary


Swift for TF solves a big problem in ML library by a new approarch which is provided by compiler technology.


Swift for TF is prepared to replace Python for ML reseachers.


Swift for TF is also a big project for mainline Swift.


Footnotes


https://jp.mathworks.com/discovery/neural-network.html ↩


https://qiita.com/uramonk/items/c207c948ccb6cd0a1346 ↩


https://www.tensorflow.org/tutorials/eager/eager_basics ↩


https://www.tensorflow.org/tutorials/eager/custom_training ↩


https://www.tensorflow.org/swift/tutorials/model_training_walkthrough ↩


https://www.tensorflow.org/swift/api_docs/Protocols/Differentiable.html#valuewithgradientin: ↩


https://en.wikipedia.org/wiki/Tony_Hoare ↩