Skip to content

Instantly share code, notes, and snippets.

@rxwei
rxwei / dl-frameworks.rst
Created November 7, 2016 21:12 — forked from bartvm/dl-frameworks.rst
A comparison of deep learning frameworks

A comparison of Theano with other deep learning frameworks, highlighting a series of low-level design choices in no particular order.

Overview

Differentiation

Symbolic: Theano, CGT; Automatic: Torch, MXNet

Symbolic and automatic differentiation are often confused or used interchangeably, although their implementations are significantly different.

@rxwei
rxwei / myia.md
Created November 7, 2016 21:16 — forked from bartvm/myia.md

Myia (Theano 2.0)

Automatic differentiation vs. symbolic

In the literature the term automatic differentiation (AD) is reserved for a specific technique in which the gradient of a program is calculated by creating an adjoint program, which performs the gradient calculation of the given program. Note that this adjoint program includes all control flow statements. There are two approaches to implementing AD: Source code transformation (SCT) and operator overloading (OO). With source code transformation we generate the adjoint program in the host language e.g. given a Python function we manipulate the abstract syntax tree (AST) directly in order to create a new Python function which performs the gradient computation. Operator overloading on the other hand overloads each operator to add an entry to a tape (Wengert list). Once the function exits, the gradient is calculated by going through the tape in reverse order, applying the gradient operators.

Theano does not employ AD but "[a highly optimized form of