Smarker/Stable Tensor Neural Networks for Rapid Deep Learning.md

## Stable Tensor Neural Networks for Rapid Deep Learning.md

      
    Raw
  

              Stable Tensor Neural Networks for Rapid Deep Learning.md
            
          
    Stable Tensor Neural Networks for Rapid Deep Learning

Paper source
Summary


t-NN framework - a NN framework with multidimensional tensor data based on the t-product (multiply 2 tensors with circulant convolution)

Pros of using t-NNs


quicker learning because of reduced parameter space
improved generalizability of stable t-NNs
reduced computation cost, memory footprint, friendlier to distributed computation
extract more meaningful info when relevant data is limited
potential to extract multidimensional correlations, given that data is not vectorized

3rd Order Tensors Background


Let A be a real-valued l x m x n tensor
frontal slices  are l x m matrices for k = 1, . . . , n
lateral slices  are l x n matrices for j = 1, . . . , m
tubes a_ij are n x 1 vectors for i = 1, . . . , l and j = 1, . . . , n


How parameter space is reduced

Before


fully connected layers use parameters inefficiently


t-NN


Suppose we have m samples of two-dimensional data of size n x n. We can vectorize these samples and store them as columns of a matrix A of size n^2 x m or orient these samples as lateral (from the side) slices stored in a tensor A of size n x m x n:

This efficient parameterization can be even more substantial for higher-dimensional data.
Improved Featurization


for the same number of parameters, the tensor weights can capture the same features as the matrix weights and additional features from applying circulant shifts of the frontal slices.
we are able to extract more features for the same number of learnable parameters

Tensor Algebra


tubes are the scalars of our tensor space (e.g., tubes commute under the t-product)


Tensors act on lateral slices, thus we consider lateral slices as analogous to vectors, so data is stored as lateral slices in our framework


bcirc

Given , bcirc(A) is a ln x mn block circulant matrix of frontal slices:

unfold


the first block-column of bcirc(A)

fold


takes all the frontal slices of A and merges them together to make A
fold(unfold(A)) = A

t-product


algebraic formulation to multiply tensors via circulant convolution

Given 
and , 

for a frontal slice:


transpose

Suppose . Then,  is the transpose of each frontal slice with slices 2 through n reversed.
The transpose is taken so that:

You can think of the transpose as performing the following frontal slice mapping:

identity

The identity tensor  is a tensor whose first frontal slice is the m x m identity matrix and the remaining slices are zero


bcirc(I) is an mn x mn identity matrix
An identity tube e_1 is the first standard basis vector oriented along the third dimension


forward propagation


Definitions

circulant matrix


A circulant matrix has one vector, c, in the first column of C. The remaining columns of C are each cyclic permutations of the vector c with offset equal to the column index. The last row of C is the vector c in reverse order, and the remaining rows are each cyclic permutations of the last row.


properties of circulant matrices


eigenvectors of a NxN circulant matrix are the Discrete Fourier Transform (DFT) sinusoids for a length N DFT

DFT - computes a discrete frequency spectrum from a discrete-time signal of finite length


circulant convolution


very efficient to compute using the FFT algorithm and the circular convolution theorem
commutative

linear convolution vs circular convolution


circular convolution is taken beween 2 periodic sequences of period N and compute over a single period for n=0 to N-1
linear convolution is computed over all relevant values of n from -infinity to infinity

tensor

A tensor is often thought of as a generalized matrix. That is, it could be a 1-D matrix (a vector is actually such a tensor), a 3-D matrix (something like a cube of numbers), even a 0-D matrix (a single number), or a higher dimensional structure that is harder to visualize. The dimension of the tensor is called its rank. (source)
If one transforms other entities in the tensor in a regular way, then the tensor must obey a related transformation rule.
matrix vs tensor


Matrix
Tensor


2-dim table to organize information
n-dim table (a generalized matrix)


5x5 matrix <= tensor of rank 2
5x5x5 matrix <= tensor of rank 3

Not every matrix can be represented as a rank 2 tensor, but any rank 2 tensor can be represented as a matrix.