Instantly share code, notes, and snippets.

# omochi/swift-for-tensorflow-talk.md Created Jul 10, 2019

slidenumber: true autoscale: true

## Swift for TensorFlow is a completely new paradigm for machine learning

### Swift for TensorFlow meetup #1

• iOS software developer at Qoncept, Inc

• Swift compiler learner (see Speaker Deck)

• Compile time computation in Swift
• ABI Stability and Library Evolution
• Introduction of Opaque Result Type
• Implementation of Opaque Result Type
• Runtime representation of function in Swift
• Implementation of Generics and Protocol in swift
• Swift runtime library and Value Witness Table
• Metatype of Generic types
• Exploring generics in swift from compiler source
• Debugging of swiftc
• Mangling and Substitution in Swift

• Qoncept, Inc.

• Research in computer vision

• Develop mobile apps

# Sorry 🙏

• I am not an expert of ML at all

• But I can share my knowledge about S4TF

• Give me advice if you can during my talk

## Decide function

• Two parameters: α, β

## How to fix errors?

• Differentiation tells us relationship between parameter and result

## Each point has a gradient

• Current: α = 0, β = 0.5
• Update: α += 0.5, β += 0.5

• Current: α = 0.5, β = 1.0
• Update: α += 0.5, β = keep

# Wrap up

• Decide shape of function

• Adjust parameters of function bit by bit

• Using differentiation of function

## Considering a more realistic task

• What kind of shape does this function have?

## Neural Network^1

• A lot of computation (mulitplication, addition, ReLU, etc...)
• Many parameters
• So it can express various complex functions
• Make it have many layers → Deep

## What is needed for ML

• Programming, for using computer and automation

• Way to express a symbolic representation of function

• Differentiation of function

• A library that does all the rest

## Automatic differentiation

• Differentiation can be derived automatically from function definition.

• Remember the process to differentiate that you learned at school in the past (chain rule).

• It is a set of primitive mutation of expression and the recursive application of it.

• Finally, the problem becomes only about how to express the function in code.

## Two expression style

• Define and run

• Explicit graph building
• TensorFlow 1.X
• Define by run

• Eager execution
• TensorFlow 2.X

## TensorFlow 1.X ^2

```x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})```

## Define and run

[.code-highlight: 1-7]

```x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})```

Defines computation here.

## Define and run

[.code-highlight: 8-14]

```x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})```

This loop performs estimation process for parameters.

## Technique in implementation

`tf.matmul(x, W) + b`
• This code seems to do calculation, but doesn't.
• This code builds computation structure which represents this expression itself.
• It is similar to AST in compiler.

### Imitated example by Swift

```class Node {}

class MatMulNode : Node {
let lhs: Node
let rhs: Node
}

let lhs: Node
let rhs: Node
}

func matmul(_ a: Node, _ b: Node) -> Node {
return MatMulNode(a, b)
}

func +(_ a: Node, _ b: Node) -> Node {
}```

## Explicit graph building style

• User code means graph building which represents this expression itself.

• Library can know complete computation structure from this graph.

• Good performance by optimization.

• Confusing. Normal code and meta code are mixed.

• Learning execution process is hidden behind runtime library except the outermost loop. It is hard to know intermediate state of values in these steps.

## TensorFlow 2.X ^3

```x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => hello, [[4.]]```
• Actual computation is executed ordinarily.

## Define by run

```w = tf.Variable([[1.0]])
loss = w * w

print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)```
• `GradientTape` scope tells library to watch and record computation executed in it.
• Execution has side effects which define computation graph.

### More example ^4

```class Model(object):
def __init__(self):
# Initialize variable to (5.0, 0.0)
# In practice, these should be initialized to random values.
self.W = tf.Variable(5.0)
self.b = tf.Variable(0.0)

def __call__(self, x):
return self.W * x + self.b

model = Model()```

```def loss(predicted_y, desired_y):
return tf.reduce_mean(tf.square(predicted_y - desired_y))

def train(model, inputs, outputs, learning_rate):
current_loss = loss(model(inputs), outputs)
dW, db = t.gradient(current_loss, [model.W, model.b])
model.W.assign_sub(learning_rate * dW)
model.b.assign_sub(learning_rate * db)```

```model = Model()

# Collect the history of W-values and b-values to plot later
Ws, bs = [], []
epochs = range(10)
for epoch in epochs:
Ws.append(model.W.numpy())
bs.append(model.b.numpy())
current_loss = loss(model(inputs), outputs)

train(model, inputs, outputs, learning_rate=0.1)
print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
(epoch, Ws[-1], bs[-1], current_loss))```

## Eager execution style

• User code just means compute expression like ordinary programming.

• Library sees computation structure during execution.

• Easy to understand what happens.

• Execution process is controled same as ordinary program. It is easy to see intermediate values.

• Optimization is limited.

• Overhead from host language execution steps.

• Define and run

• Performance
• Usability
• Define by run

• Usability
• Perfomance

## Rethink

`tf.matmul(x, W) + b`
• Problem is whether this code express structure of computation or just execution of it.

• Graph building is similar to AST in compiler.

## In our point of view

`tf.matmul(x, W) + b`
• This code is recognized as AST in compiler on compile time. AST is exactly graph of program over computation.

• This code executes this expression on runtime.

# Swift for TensorFlow ^5

```for epoch in 1...epochCount {
var epochLoss: Float = 0
var epochAccuracy: Float = 0
var batchCount: Int = 0
for batch in trainDataset {
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}

let logits = model(batch.features)
epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels)
epochLoss += loss.scalarized()
batchCount += 1
}
epochAccuracy /= Float(batchCount)
epochLoss /= Float(batchCount)
trainAccuracyResults.append(epochAccuracy)
trainLossResults.append(epochLoss)
if epoch % 50 == 0 {
print("Epoch \(epoch): Loss: \(epochLoss), Accuracy: \(epochAccuracy)")
}
}```

```let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}```

```extension Differenciable {
in f: @differentiable (Self) -> R
) -> (value: R, gradient: CotangentVector)
where R : FloatingPoint, R.CotangentVector == R
}```

(reference) ^6

• Code means just ordinal computation.
• Compiler provides differentiation of function at where it needs in compile time.
• Compiler can optimize whole execution not only expression but including other control code.

It's a really natural and sensible approach.

## Swift

Best language in the world today

• Type inference provides safety and makes it easy to write.
• Null safety which fixes the billion-dollar mistake.^7
• Compiler tells about mistakes such as typo to the programmer.
• Less debugging time and more productivity.

## New challenge for Swift

• Swift is only used in iOS development actually. 😢

• Swift for TF may spread out use to the massive ML world.

## Why does almost everyone use python for ML now ?

• Because python is current winner

• Many researchers are already using python, so new researchers also use it to easily follow them.

• Many papers and libraries exist and these are great assets.

• 💡But language doesn't have particular superiority about ML.

• People doesn't have reason to use python language, but want assets of python.

• In short, they want numpy.

## Python interoprability of Swift

```let np = Python.import("numpy")
print(np)
// => <module 'numpy' from '/usr/local/lib/python3.6/
//      dist-packages/numpy/__init__.py'>

let zeros = np.ones([2, 3])
print(zeros)
// => [[1. 1. 1.]
//     [1. 1. 1.]]```

• It really runs python behind Swift.
• It is syntactically Swift, but actually python.
• It is very similar to the case when we use Objective-C class from Swift.

## Lets grow Swift for TF

• There are some companies which has ML section and Swift experts.

• They are the best players to dive into whole new ML generation.

• Of course, our company also has the potential.

## I found the last missing part

• Python libraries already exist that can be used from Swift.

• But codes do not.

• Users should want to copy and paste when they start working.

• So Python to Python in Swift transpiler is needed.

• I think Google might be developing it below the surface.

## Summary

• Swift for TF solves a big problem in ML library by a new approarch which is provided by compiler technology.

• Swift for TF is prepared to replace Python for ML reseachers.

• Swift for TF is also a big project for mainline Swift.