Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

slidenumber: true autoscale: true

Swift for TensorFlow is a completely new paradigm for machine learning

by @omochimetaru from Qoncept, Inc

Swift for TensorFlow meetup #1


About me

  • omochimetaru @ twitter inline 20%

  • iOS software developer at Qoncept, Inc

  • Swift compiler learner (see Speaker Deck)

    • Compile time computation in Swift
    • ABI Stability and Library Evolution
    • Introduction of Opaque Result Type
    • Implementation of Opaque Result Type
    • Runtime representation of function in Swift
    • Implementation of Generics and Protocol in swift
    • Swift runtime library and Value Witness Table
    • Metatype of Generic types
    • Exploring generics in swift from compiler source
    • Debugging of swiftc
    • IRGen for Type Metadata
    • Mangling and Substitution in Swift

About our company

  • Qoncept, Inc. inline 100%

  • Research in computer vision

  • Develop mobile apps



Sorry 🙏

  • I am not an expert of ML at all

  • But I can share my knowledge about S4TF

  • Give me advice if you can during my talk


What is ML

(I talk only about the current popular approach)


Data points

inline


inline


Decide function

inline

  • Two parameters: α, β

Start with random initial parameters

inline


How to fix errors?

  • Differentiation tells us relationship between parameter and result

Differentiation of function

inline


Each point has a gradient

inline


inline


inline


inline

  • Current: α = 0, β = 0.5
  • Update: α += 0.5, β += 0.5

inline


inline

  • Current: α = 0.5, β = 1.0
  • Update: α += 0.5, β = keep

inline


Wrap up

  • Decide shape of function

  • Adjust parameters of function bit by bit

    • Using differentiation of function

What is Deep Learning


Considering a more realistic task

  • What kind of shape does this function have?

inline


Neural Network^1

inline


  • A lot of computation (mulitplication, addition, ReLU, etc...)
  • Many parameters
  • So it can express various complex functions
  • Make it have many layers → Deep

What is needed for ML


  • Programming, for using computer and automation

  • Way to express a symbolic representation of function

  • Differentiation of function

  • A library that does all the rest


Automatic differentiation

  • Differentiation can be derived automatically from function definition.

  • Remember the process to differentiate that you learned at school in the past (chain rule).

  • It is a set of primitive mutation of expression and the recursive application of it.

  • Finally, the problem becomes only about how to express the function in code.


There are many such libraries in python world


Problem

How to express function?


Two expression style

  • Define and run

    • Explicit graph building
    • TensorFlow 1.X
  • Define by run

    • Eager execution
    • TensorFlow 2.X

TensorFlow 1.X ^2

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Define and run

[.code-highlight: 1-7]

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Defines computation here.


Define and run

[.code-highlight: 8-14]

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

This loop performs estimation process for parameters.


Technique in implementation

tf.matmul(x, W) + b
  • This code seems to do calculation, but doesn't.
  • This code builds computation structure which represents this expression itself.
  • It is similar to AST in compiler.

Imitated example by Swift

class Node {}

class MatMulNode : Node {
    let lhs: Node
    let rhs: Node
}

class AddNode : Node {
    let lhs: Node
    let rhs: Node
}

func matmul(_ a: Node, _ b: Node) -> Node {
    return MatMulNode(a, b)
}

func +(_ a: Node, _ b: Node) -> Node {
    return AddNode(a, b)
}

Explicit graph building style

  • User code means graph building which represents this expression itself.

  • Library can know complete computation structure from this graph.


  • Good performance by optimization.

  • Confusing. Normal code and meta code are mixed.

  • Learning execution process is hidden behind runtime library except the outermost loop. It is hard to know intermediate state of values in these steps.


TensorFlow 2.X ^3

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => hello, [[4.]]
  • Actual computation is executed ordinarily.

Define by run

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)
  • GradientTape scope tells library to watch and record computation executed in it.
  • Execution has side effects which define computation graph.

More example ^4

class Model(object):
  def __init__(self):
    # Initialize variable to (5.0, 0.0)
    # In practice, these should be initialized to random values.
    self.W = tf.Variable(5.0)
    self.b = tf.Variable(0.0)

  def __call__(self, x):
    return self.W * x + self.b

model = Model()

def loss(predicted_y, desired_y):
  return tf.reduce_mean(tf.square(predicted_y - desired_y))

def train(model, inputs, outputs, learning_rate):
  with tf.GradientTape() as t:
    current_loss = loss(model(inputs), outputs)
  dW, db = t.gradient(current_loss, [model.W, model.b])
  model.W.assign_sub(learning_rate * dW)
  model.b.assign_sub(learning_rate * db)

model = Model()

# Collect the history of W-values and b-values to plot later
Ws, bs = [], []
epochs = range(10)
for epoch in epochs:
  Ws.append(model.W.numpy())
  bs.append(model.b.numpy())
  current_loss = loss(model(inputs), outputs)

  train(model, inputs, outputs, learning_rate=0.1)
  print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
        (epoch, Ws[-1], bs[-1], current_loss))

Eager execution style

  • User code just means compute expression like ordinary programming.

  • Library sees computation structure during execution.


  • Easy to understand what happens.

  • Execution process is controled same as ordinary program. It is easy to see intermediate values.

  • Optimization is limited.

  • Overhead from host language execution steps.


Tradeoff

  • Define and run

    • Performance
    • Usability
  • Define by run

    • Usability
    • Perfomance

Can this be solved? 🤔


Rethink

tf.matmul(x, W) + b
  • Problem is whether this code express structure of computation or just execution of it.

  • Graph building is similar to AST in compiler.


We are Swift programmer


In our point of view

tf.matmul(x, W) + b
  • This code is recognized as AST in compiler on compile time. AST is exactly graph of program over computation.

  • This code executes this expression on runtime.


Compiler support can solve problem


inline


Swift for TensorFlow ^5

for epoch in 1...epochCount {
    var epochLoss: Float = 0
    var epochAccuracy: Float = 0
    var batchCount: Int = 0
    for batch in trainDataset {
        let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
            let logits = model(batch.features)
            return softmaxCrossEntropy(logits: logits, labels: batch.labels)
        }
        optimizer.update(&model.allDifferentiableVariables, along: grad)
        
        let logits = model(batch.features)
        epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels)
        epochLoss += loss.scalarized()
        batchCount += 1
    }
    epochAccuracy /= Float(batchCount)
    epochLoss /= Float(batchCount)
    trainAccuracyResults.append(epochAccuracy)
    trainLossResults.append(epochLoss)
    if epoch % 50 == 0 {
        print("Epoch \(epoch): Loss: \(epochLoss), Accuracy: \(epochAccuracy)")
    }
}

let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
    let logits = model(batch.features)
    return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}

extension Differenciable {
    func valueWithGradient<R : Differentiable>(
        in f: @differentiable (Self) -> R
    ) -> (value: R, gradient: CotangentVector)
        where R : FloatingPoint, R.CotangentVector == R
}

(reference) ^6


Completely new paradigm

  • Code means just ordinal computation.
  • Compiler provides differentiation of function at where it needs in compile time.
  • Compiler can optimize whole execution not only expression but including other control code.

It's a really natural and sensible approach.


Swift

Best language in the world today

  • Type inference provides safety and makes it easy to write.
  • Null safety which fixes the billion-dollar mistake.^7
  • Compiler tells about mistakes such as typo to the programmer.
  • Less debugging time and more productivity.

New challenge for Swift

  • Swift is only used in iOS development actually. 😢

  • Swift for TF may spread out use to the massive ML world.


Why does almost everyone use python for ML now ?


  • Because python is current winner

  • Many researchers are already using python, so new researchers also use it to easily follow them.

  • Many papers and libraries exist and these are great assets.


  • 💡But language doesn't have particular superiority about ML.

  • People doesn't have reason to use python language, but want assets of python.

  • In short, they want numpy.


Python interoprability of Swift

let np = Python.import("numpy")
print(np)
// => <module 'numpy' from '/usr/local/lib/python3.6/
//      dist-packages/numpy/__init__.py'>

let zeros = np.ones([2, 3])
print(zeros)
// => [[1. 1. 1.]
//     [1. 1. 1.]]

  • It really runs python behind Swift.
  • It is syntactically Swift, but actually python.
  • It is very similar to the case when we use Objective-C class from Swift.

Lets grow Swift for TF

  • There are some companies which has ML section and Swift experts.

  • They are the best players to dive into whole new ML generation.

  • Of course, our company also has the potential.


I found the last missing part

  • Python libraries already exist that can be used from Swift.

  • But codes do not.

  • Users should want to copy and paste when they start working.

  • So Python to Python in Swift transpiler is needed.

  • I think Google might be developing it below the surface.


Summary

  • Swift for TF solves a big problem in ML library by a new approarch which is provided by compiler technology.

  • Swift for TF is prepared to replace Python for ML reseachers.

  • Swift for TF is also a big project for mainline Swift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.