Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Smarker/5fd14f88d69d00efba98881205442ecf to your computer and use it in GitHub Desktop.
Save Smarker/5fd14f88d69d00efba98881205442ecf to your computer and use it in GitHub Desktop.
#research-paper

Stable Tensor Neural Networks for Rapid Deep Learning

Paper source

Summary

  • t-NN framework - a NN framework with multidimensional tensor data based on the t-product (multiply 2 tensors with circulant convolution)

Pros of using t-NNs

  • quicker learning because of reduced parameter space
  • improved generalizability of stable t-NNs
  • reduced computation cost, memory footprint, friendlier to distributed computation
  • extract more meaningful info when relevant data is limited
  • potential to extract multidimensional correlations, given that data is not vectorized

3rd Order Tensors Background

  • Let A be a real-valued l x m x n tensor
  • frontal slices image are l x m matrices for k = 1, . . . , n
  • lateral slices image are l x n matrices for j = 1, . . . , m
  • tubes a_ij are n x 1 vectors for i = 1, . . . , l and j = 1, . . . , n image

How parameter space is reduced

Before

  • fully connected layers use parameters inefficiently

image

t-NN

image

image

Suppose we have m samples of two-dimensional data of size n x n. We can vectorize these samples and store them as columns of a matrix A of size n^2 x m or orient these samples as lateral (from the side) slices stored in a tensor A of size n x m x n: image

This efficient parameterization can be even more substantial for higher-dimensional data.

Improved Featurization

image

  • for the same number of parameters, the tensor weights can capture the same features as the matrix weights and additional features from applying circulant shifts of the frontal slices.
  • we are able to extract more features for the same number of learnable parameters

Tensor Algebra

  • tubes are the scalars of our tensor space (e.g., tubes commute under the t-product)

  • Tensors act on lateral slices, thus we consider lateral slices as analogous to vectors, so data is stored as lateral slices in our framework

bcirc

Given image, bcirc(A) is a ln x mn block circulant matrix of frontal slices: image

unfold

  • the first block-column of bcirc(A)

fold

  • takes all the frontal slices of A and merges them together to make A
  • fold(unfold(A)) = A

t-product

Given image and image, image

  • for a frontal slice: image

transpose

Suppose image. Then, image is the transpose of each frontal slice with slices 2 through n reversed.

The transpose is taken so that: image

You can think of the transpose as performing the following frontal slice mapping: image

identity

The identity tensor image is a tensor whose first frontal slice is the m x m identity matrix and the remaining slices are zero

image

  • bcirc(I) is an mn x mn identity matrix
  • An identity tube e_1 is the first standard basis vector oriented along the third dimension

image

forward propagation

image

image

steph-notes

Definitions

circulant matrix

  • A circulant matrix has one vector, c, in the first column of C. The remaining columns of C are each cyclic permutations of the vector c with offset equal to the column index. The last row of C is the vector c in reverse order, and the remaining rows are each cyclic permutations of the last row.

image

properties of circulant matrices

  • eigenvectors of a NxN circulant matrix are the Discrete Fourier Transform (DFT) sinusoids for a length N DFT
    • DFT - computes a discrete frequency spectrum from a discrete-time signal of finite length

circulant convolution

  • very efficient to compute using the FFT algorithm and the circular convolution theorem
  • commutative

linear convolution vs circular convolution

  • circular convolution is taken beween 2 periodic sequences of period N and compute over a single period for n=0 to N-1
  • linear convolution is computed over all relevant values of n from -infinity to infinity

tensor

A tensor is often thought of as a generalized matrix. That is, it could be a 1-D matrix (a vector is actually such a tensor), a 3-D matrix (something like a cube of numbers), even a 0-D matrix (a single number), or a higher dimensional structure that is harder to visualize. The dimension of the tensor is called its rank. (source)

If one transforms other entities in the tensor in a regular way, then the tensor must obey a related transformation rule.

matrix vs tensor

Matrix Tensor
2-dim table to organize information n-dim table (a generalized matrix)
  • 5x5 matrix <= tensor of rank 2
  • 5x5x5 matrix <= tensor of rank 3

Not every matrix can be represented as a rank 2 tensor, but any rank 2 tensor can be represented as a matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment