t-NN framework
- a NN framework with multidimensionaltensor
data based on thet-product
(multiply 2 tensors with circulant convolution)
- quicker learning because of reduced parameter space
- improved generalizability of stable t-NNs
- reduced computation cost, memory footprint, friendlier to distributed computation
- extract more meaningful info when relevant data is limited
- potential to extract multidimensional correlations, given that data is not vectorized
- Let A be a real-valued
l x m x n
tensor - frontal slices
are
l x m
matrices fork = 1, . . . , n
- lateral slices
are
l x n
matrices forj = 1, . . . , m
- tubes
a_ij
aren x 1
vectors fori = 1, . . . , l
andj = 1, . . . , n
- fully connected layers use parameters inefficiently
Suppose we have m samples of two-dimensional data of size n x n
. We can vectorize these samples and store them as columns of a matrix A of size n^2 x m
or orient these samples as lateral (from the side) slices stored in a tensor A of size n x m x n
:
This efficient parameterization can be even more substantial for higher-dimensional data.
- for the same number of parameters, the tensor weights can capture the same features as the matrix weights and additional features from applying circulant shifts of the frontal slices.
- we are able to extract more features for the same number of learnable parameters
tubes
are the scalars of our tensor space (e.g., tubes commute under the t-product)
- Tensors act on
lateral slices
, thus we consider lateral slices as analogous to vectors, so data is stored as lateral slices in our framework
Given , bcirc(A) is a
ln x mn
block circulant
matrix of frontal slices:
- the first
block-column
of bcirc(A)
- takes all the frontal slices of A and merges them together to make A
fold(unfold(A)) = A
- algebraic formulation to multiply tensors via circulant convolution
Suppose . Then,
is the
transpose
of each frontal slice
with slices 2
through n
reversed.
The transpose is taken so that:
You can think of the transpose as performing the following frontal slice mapping:
The identity tensor is a tensor whose first frontal slice is the
m x m
identity matrix and the remaining slices are zero
- bcirc(I) is an
mn x mn
identity matrix - An identity tube e_1 is the first standard basis vector oriented along the third dimension
- A circulant matrix has one vector, c, in the first column of C. The remaining columns of C are each cyclic permutations of the vector c with offset equal to the column index. The last row of C is the vector c in reverse order, and the remaining rows are each cyclic permutations of the last row.
- eigenvectors of a NxN
circulant matrix
are the Discrete Fourier Transform (DFT) sinusoids for a length N DFTDFT
- computes a discrete frequency spectrum from a discrete-time signal of finite length
- very efficient to compute using the
FFT
algorithm and thecircular convolution theorem
- commutative
- circular convolution is taken beween 2 periodic sequences of period N and compute over a single period for n=0 to N-1
- linear convolution is computed over all relevant values of n from -infinity to infinity
A tensor is often thought of as a generalized matrix. That is, it could be a 1-D matrix (a vector is actually such a tensor), a 3-D matrix (something like a cube of numbers), even a 0-D matrix (a single number), or a higher dimensional structure that is harder to visualize. The dimension of the tensor is called its rank. (source)
If one transforms other entities in the tensor in a regular way, then the tensor must obey a related transformation rule.
Matrix | Tensor |
---|---|
2-dim table to organize information | n-dim table (a generalized matrix) |
5x5
matrix <= tensor of rank2
5x5x5
matrix <= tensor of rank3
Not every matrix can be represented as a rank 2 tensor, but any rank 2 tensor can be represented as a matrix.