Skip to content

Instantly share code, notes, and snippets.

@dan-zheng
Last active November 17, 2018 23:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dan-zheng/70318fd81a2dddc4db89d1b5cd5e26f2 to your computer and use it in GitHub Desktop.
Save dan-zheng/70318fd81a2dddc4db89d1b5cd5e26f2 to your computer and use it in GitHub Desktop.
Adam optimizer example

Optimizers often use per-parameter auxiliary variables (e.g. Adam maintains first and second moments for each parameter). Such auxiliary variables are naturally represented as a instance of a ParameterAggregate-conforming type (because they have one value for each parameter). Complex iteration and mutation of parameters, alongside gradients and auxiliary variables, is enabled using key paths as described here.

Learn more about key paths here: https://github.com/apple/swift-evolution/blob/master/proposals/0161-key-paths.md Key path API documentation: https://developer.apple.com/documentation/swift/swift_standard_library/key_path_expressions

struct MNISTParameters : ParameterAggregate {
var w1 = Tensor<Float>(randomNormal: [784, 30])
var w2 = Tensor<Float>(randomNormal: [30, 10])
// Compiler-synthesized:
// static var allKeyPaths: [WritableKeyPath<MNISTParameters, Tensor<Float>>] {
// return [\MNISTParameters.w1, \MNISTParameters.w2]
// }
// Learn more about key paths here: https://github.com/apple/swift-evolution/blob/master/proposals/0161-key-paths.md
}
struct AdamOptimizer {
typealias Scalar = Float
var learningRate: Scalar
var beta1: Scalar
var beta2: Scalar
var epsilon: Scalar
init(learningRate: Scalar = 0.001, beta1: Scalar = 0.9, beta2: Scalar = 0.999, epsilon: Scalar = 1e-8) {
self.learningRate = learningRate
self.beta1 = beta1
self.beta2 = beta2
self.epsilon = epsilon
}
var step: Float = 0
var firstMoments: MNISTParameters? = nil
var secondMoments: MNISTParameters? = nil
// `fitParameters` can be generalized to work with any `ParameterAggregate`-conforming type when such types
// define a zero initializer. There are multiple ways to enable this (e.g. conforming `ParameterAggregate` to
// `VectorNumeric`).
mutating func fitParameters(
_ parameters: inout MNISTParameters,
withGradients gradients: MNISTParameters
) {
func initializeWithZerosIfNeeded(_ x: MNISTParameters?) -> MNISTParameters {
return x ?? MNISTParameters(
w1: Tensor(0).broadcast(like: parameters.w1),
w2: Tensor(0).broadcast(like: parameters.w2)
)
}
var firstMoments = initializeWithZerosIfNeeded(self.firstMoments)
var secondMoments = initializeWithZerosIfNeeded(self.secondMoments)
step += 1
// Iterating over `allKeyPaths` and applying key paths currently produce sends/receives.
// It should be possible to eliminate sends/receives eventually, by fully unrolling the loop at compile-time
// and implementing compile-time evaluation of key path initialization and application.
// Read the key path design for more information.
for kp in MNISTParameters.allKeyPaths {
firstMoments[keyPath: kp] =
firstMoments[keyPath: kp] * beta1 + (1 - beta1) * gradients[keyPath: kp]
secondMoments[keyPath: kp] =
firstMoments[keyPath: kp] * beta2 + (1 - beta2) * gradients[keyPath: kp] * gradients[keyPath: kp]
let denominator = sqrt(secondMoments[keyPath: kp]) + epsilon
let biasCorrection1 = 1 - pow(beta1, step)
let biasCorrection2 = 1 - pow(beta2, step)
let stepSize = learningRate * sqrt(biasCorrection2) / biasCorrection1
parameters[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / denominator
}
self.firstMoments = firstMoments
self.secondMoments = secondMoments
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment