Skip to content

Instantly share code, notes, and snippets.

@edcote
Last active July 11, 2018 23:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edcote/86e9efcfa7339c75cbfc731aa7512c21 to your computer and use it in GitHub Desktop.
Save edcote/86e9efcfa7339c75cbfc731aa7512c21 to your computer and use it in GitHub Desktop.
NeuroMem research

CM1K Chip

Each neuron consists of SRAM and a small programmable logic unit. The logic is prewired to run certrain types of algorithms. Neurons are interconnected using a small bidir bus.

Released in 2007. Follow up to IBM ZISC chip. ZISC refers to an architecture based solely on pattern matching and the abscense of micro-instructions. A single ZISC036 holds 36 neurons to implement an RBF network trained with the RCE (or ROI) algorithm.

ZISC employs Radial Basis Function (RBF) and K-Nearest Neighbor (KNN) algorithms. ZISC approach is a specialized but cheap chip to do one thing very quickly.

  • RBF: Real-valued function whose value depends only on the distance from the origin. Used as a kernel in support vector classification. Can be interpreted as a simple single-layer type of ANN.
  • KNN: Stores all avalable cases and classifies new cases based on a similarly measure (e.g. distance function). Used in statistical estimation and pattern recognition.

The chip possesses 1024 neurons each with its own memory for trained signature storage and a processor for recognition and distance calculations. The memory in each neuron contains 256 elements each with 8-bit capacity for a total of 256 bytes of information per neuron.

The identical neurons learn and respond to vector inputs in parallel while they incorporate information from all the trained neurons in the network (Q: how does training work?).

Recognition and training can be accomplished simultaneously. HOW??

SRAM serves as the neuron's memory in the integrated circuit. Requires substantial wafer real-estate and energy resources for refresh. Flash memory required to reload SRAM when power is removed.

Other similar architectures:

  • TrueNorth (IBM)
  • Zeroth (Qualcomm)

Implments only feed forward neural networks. One single hidden layer.

Spiking Neurons vs. Low Precision Tensors

TensorFlow's central data type is the tensor. Tensors are the underlying components of computation and a fundamental data structure in TensorFlow. Without using complex mathematical interpretations, we can say a tensor (in TensorFlow) describes a multidimensional numerical array, with zero or n-dimensional collection of data, determined by rank, shape, and type.

http://www.pythonprogramming.in/learn-tensorflow-series/creating-and-processing-tensors.html

The TPU is just an 8-bit (integer) matrix multiply ASIC. It also has a systolic array for matrix multiplation. And for memory it uses a large on-chip activation buffer. It is rather efficient for deep nets.

TrueNorth (and others) adops a "sea of cores" approach in which each core works independenly and has its own memory. It implements spiking neural networks which are a different family of neural networks that tend to underperform deep nets. Commentary: there's a reason these have not become popular?

https://www.quora.com/What-is-the-difference-between-Googles-tensor-processing-unit-TPU-and-neural-processing-units-like-IBMs-TrueNorth

Literature study: Asynchornous neurosynaptic-chips

Analog implementations are non deterministic.

  • The TrueNorth Chips

1:1 requirement between neural nets in software and hardware. IBM chose integrated-fire-and-spike neuron with binary output. The advantage of spiking neurons is that you don't need multipliers. But to get good precision on a task like ImageNet you need about 8 bit of precision on the neuron states. To get this kind of precision with spiking neurons requires to wait multiple cycles so that the spikes "average out".

Architecture is locally asynchronous and globally synchronous.

http://brain4free.org/wiki/lib/exe/fetch.php/texte:hit_exercise_asynchronous_neurosynaptic-chips.pdf

Comemnts from Yann Lecun

IBM has article in Science about TrueNorth neural net chip. Neural states are binary (spikes). Have skepticism of the approach.

NeuFlow implements convolutional network networks which can produce start of the art performance on a number of vision tasks.

TrueNorth implements networks of integrate-and-fire spiking neurons. This type of neural net has never been shown to yield accurate anywhere close to the state of art on any task of interest. Advantage is that you don't need multipliers.

Power and performance of NeuFlow on par with TrueNorth.

Convolutional Neural Networks

Convolutional Neural Networks are similar to ordinary Neural Networks. ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture.

Regular neural nets receive an input and transform it through a series of hidden layers. They do not scale well to full images. Full connectivity is wasteful.

CNN tcan constraint the architectre in a more sensible way. Neurons can be arranged in 3 dimensions: width, height, depth (activation colume).

A simple ConvNet is a sequence of layers. Every layer of a ConvNet transforms one volume of actions to another through a differentiable function. This can be implemented using matrix multiplication.

http://cs231n.github.io/convolutional-networks/

TensorFlow, Caffe, etc.

Didier mentioned targeting these for CM1K but these are for machine learning, DNNs. Is there value writing backend for basic feed forward networks?

Notes from convo. w/ Pete

As to spiking neuron - only if it uses some very novel memory technology coupled with very dense low power low voltage analog type implementation - to create a primitive low voltage low power "positronic brain" - does this seem interesting. Something that might run off of scavenged power for biological implants or whatever.

Anything else doesn't stand a chance against standard DL pipeline approaches and the Billions of $ going into those approaches , or Flash EPROM based analog computing/current steering approaches from the likes of Mythic or Microchip/SST

But it can all depend on whether they find some high volume niche. Mythic for example is focused on always on voice recognition, so they can build a very tuned low power low precision inference engine to do that. They don't stand a chance in the datacenter for example as a general purpose cloud based inference engine.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment