slidenumber: true autoscale: true
-
iOS software developer at Qoncept, Inc
-
Swift compiler learner (see Speaker Deck)
- Compile time computation in Swift
- ABI Stability and Library Evolution
- Introduction of Opaque Result Type
- Implementation of Opaque Result Type
- Runtime representation of function in Swift
- Implementation of Generics and Protocol in swift
- Swift runtime library and Value Witness Table
- Metatype of Generic types
- Exploring generics in swift from compiler source
- Debugging of swiftc
- IRGen for Type Metadata
- Mangling and Substitution in Swift
-
https://twitter.com/hayashi/status/1120571069000241152
- Our reseacher build it with TensorFlow
- Before ARKit 3
-
https://twitter.com/yp1_ydct/status/1108691148141297664
- Search "アイスコープ" in twitter
-
more: https://qoncept.co.jp
-
I am not an expert of ML at all
-
But I can share my knowledge about S4TF
-
Give me advice if you can during my talk
- Two parameters: α, β
- Differentiation tells us relationship between parameter and result
- Current: α = 0, β = 0.5
- Update: α += 0.5, β += 0.5
- Current: α = 0.5, β = 1.0
- Update: α += 0.5, β = keep
-
Decide shape of function
-
Adjust parameters of function bit by bit
- Using differentiation of function
- What kind of shape does this function have?
Neural Network1
- A lot of computation (mulitplication, addition, ReLU, etc...)
- Many parameters
- So it can express various complex functions
- Make it have many layers → Deep
-
Programming, for using computer and automation
-
Way to express a symbolic representation of function
-
Differentiation of function
-
A library that does all the rest
-
Differentiation can be derived automatically from function definition.
-
Remember the process to differentiate that you learned at school in the past (chain rule).
-
It is a set of primitive mutation of expression and the recursive application of it.
-
Finally, the problem becomes only about how to express the function in code.
-
Define and run
- Explicit graph building
- TensorFlow 1.X
-
Define by run
- Eager execution
- TensorFlow 2.X
TensorFlow 1.X 2
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})
[.code-highlight: 1-7]
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})
Defines computation here.
[.code-highlight: 8-14]
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})
This loop performs estimation process for parameters.
tf.matmul(x, W) + b
- This code seems to do calculation, but doesn't.
- This code builds computation structure which represents this expression itself.
- It is similar to AST in compiler.
class Node {}
class MatMulNode : Node {
let lhs: Node
let rhs: Node
}
class AddNode : Node {
let lhs: Node
let rhs: Node
}
func matmul(_ a: Node, _ b: Node) -> Node {
return MatMulNode(a, b)
}
func +(_ a: Node, _ b: Node) -> Node {
return AddNode(a, b)
}
-
User code means graph building which represents this expression itself.
-
Library can know complete computation structure from this graph.
-
✅ Good performance by optimization.
-
❌ Confusing. Normal code and meta code are mixed.
-
❌ Learning execution process is hidden behind runtime library except the outermost loop. It is hard to know intermediate state of values in these steps.
TensorFlow 2.X 3
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => hello, [[4.]]
- Actual computation is executed ordinarily.
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
loss = w * w
grad = tape.gradient(loss, w)
print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)
GradientTape
scope tells library to watch and record computation executed in it.- Execution has side effects which define computation graph.
More example 4
class Model(object):
def __init__(self):
# Initialize variable to (5.0, 0.0)
# In practice, these should be initialized to random values.
self.W = tf.Variable(5.0)
self.b = tf.Variable(0.0)
def __call__(self, x):
return self.W * x + self.b
model = Model()
def loss(predicted_y, desired_y):
return tf.reduce_mean(tf.square(predicted_y - desired_y))
def train(model, inputs, outputs, learning_rate):
with tf.GradientTape() as t:
current_loss = loss(model(inputs), outputs)
dW, db = t.gradient(current_loss, [model.W, model.b])
model.W.assign_sub(learning_rate * dW)
model.b.assign_sub(learning_rate * db)
model = Model()
# Collect the history of W-values and b-values to plot later
Ws, bs = [], []
epochs = range(10)
for epoch in epochs:
Ws.append(model.W.numpy())
bs.append(model.b.numpy())
current_loss = loss(model(inputs), outputs)
train(model, inputs, outputs, learning_rate=0.1)
print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
(epoch, Ws[-1], bs[-1], current_loss))
-
User code just means compute expression like ordinary programming.
-
Library sees computation structure during execution.
-
✅ Easy to understand what happens.
-
✅ Execution process is controled same as ordinary program. It is easy to see intermediate values.
-
❌ Optimization is limited.
-
❌ Overhead from host language execution steps.
-
Define and run
- ✅ Performance
- ❌ Usability
-
Define by run
- ✅ Usability
- ❌ Perfomance
tf.matmul(x, W) + b
-
Problem is whether this code express structure of computation or just execution of it.
-
Graph building is similar to AST in compiler.
tf.matmul(x, W) + b
-
This code is recognized as AST in compiler on compile time. AST is exactly graph of program over computation.
-
This code executes this expression on runtime.
Swift for TensorFlow 5
for epoch in 1...epochCount {
var epochLoss: Float = 0
var epochAccuracy: Float = 0
var batchCount: Int = 0
for batch in trainDataset {
let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}
optimizer.update(&model.allDifferentiableVariables, along: grad)
let logits = model(batch.features)
epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels)
epochLoss += loss.scalarized()
batchCount += 1
}
epochAccuracy /= Float(batchCount)
epochLoss /= Float(batchCount)
trainAccuracyResults.append(epochAccuracy)
trainLossResults.append(epochLoss)
if epoch % 50 == 0 {
print("Epoch \(epoch): Loss: \(epochLoss), Accuracy: \(epochAccuracy)")
}
}
let (loss, grad) = model.valueWithGradient { (model: IrisModel) -> Tensor<Float> in
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}
extension Differenciable {
func valueWithGradient<R : Differentiable>(
in f: @differentiable (Self) -> R
) -> (value: R, gradient: CotangentVector)
where R : FloatingPoint, R.CotangentVector == R
}
(reference) 6
- ✅ Code means just ordinal computation.
- ✅ Compiler provides differentiation of function at where it needs in compile time.
- ✅ Compiler can optimize whole execution not only expression but including other control code.
It's a really natural and sensible approach.
Best language in the world today
- Type inference provides safety and makes it easy to write.
- Null safety which fixes the billion-dollar mistake.7
- Compiler tells about mistakes such as typo to the programmer.
- Less debugging time and more productivity.
-
Swift is only used in iOS development actually. 😢
-
Swift for TF may spread out use to the massive ML world.
-
Because python is current winner
-
Many researchers are already using python, so new researchers also use it to easily follow them.
-
Many papers and libraries exist and these are great assets.
-
💡But language doesn't have particular superiority about ML.
-
People doesn't have reason to use python language, but want assets of python.
-
In short, they want numpy.
let np = Python.import("numpy")
print(np)
// => <module 'numpy' from '/usr/local/lib/python3.6/
// dist-packages/numpy/__init__.py'>
let zeros = np.ones([2, 3])
print(zeros)
// => [[1. 1. 1.]
// [1. 1. 1.]]
- It really runs python behind Swift.
- It is syntactically Swift, but actually python.
- It is very similar to the case when we use Objective-C class from Swift.
-
There are some companies which has ML section and Swift experts.
-
They are the best players to dive into whole new ML generation.
-
Of course, our company also has the potential.
-
Python libraries already exist that can be used from Swift.
-
But codes do not.
-
Users should want to copy and paste when they start working.
-
So Python to Python in Swift transpiler is needed.
-
I think Google might be developing it below the surface.
-
Swift for TF solves a big problem in ML library by a new approarch which is provided by compiler technology.
-
Swift for TF is prepared to replace Python for ML reseachers.
-
Swift for TF is also a big project for mainline Swift.