kar9222/linear_model_foundation_of_architecture_of_deep_learning.md

## linear_model_foundation_of_architecture_of_deep_learning.md

      
    Raw
  

              linear_model_foundation_of_architecture_of_deep_learning.md
            
          
    Related tweets

Today, we’re going to play a game I’m calling “IT’S JUST A LINEAR MODEL” (IJALM)...
How about deep learning? Super non-linear, right?  Well, as a function of some non-linear activations, it's IJALM...

Codes are avaiable in both Julia and R. No dependency. You simply run it, or play around with it.

Julia
R

Linear model, foundation of architecture of deep learning

A dummy, 1-dimensional architecture of deep learning network
Using mathematical notation of linear regression, β0 and β
#' Non-linear activation function e.g. Rectified Linear Unit (RELU)
activation(x) = max(0, x)

#' Layer by layer...input is transformed using both linear and non-linear function
β0, β = 1, 2  # Arbitrary number
layer_1(x) = activation( β0 + β * x )  # Linear and non-linear
layer_2(x) = β0 + β * x                # Linear
network(x) = layer_2(layer_1(x))       # Multi-layer linear and non-linear

output = network(123)
#' Input = 123, output = 495
A simple architercture of two-layer deep learning network
Using conventional mathematical notation of deep learning
input = [1, 2, 3]  # Vector of size 3
#' Non-linear activation function e.g. Rectified Linear Unit (RELU)
activation(x) = max(0, x)

#' Weights and biases in matrix form, just like β0 and β1, β2... of linear model
#' Here, input dimension = 3, intermediate dimension = 5, output dimension = 1
W1, b1 = rand(5, 3), rand(5)
W2, b2 = rand(1, 5), rand(1)

#' Simple matrix dot product of linear and non-linear models
layer_1(x) = activation.( W1 * x .+ b1)  # Linear and non-linear
layer_2(x) = W2 * x .+ b2                # Linear
network(x) = layer_2(layer_1(x))         # Multi-layer linear and non-linear

input  # 3-element Array{Int64,1}: 1 2 3
#' Layer by layer...input is transformed using both linear and non-linear function
output = network(input)       # 1-element Array{Float64,1}: 4.5

#' Internally
output_1 = layer_1(input)     # 5-element Array{Float64,1}: 2.3 2.6 3.0 1.4 3.4
output   = layer_2(output_1)  # 1-element Array{Float64,1}: 4.5
Linear model, foundation of architecture of deep learning [R]

A dummy, 1-dimensional architecture of deep learning network
Using mathematical notation of linear regression, β0 and β
#' Non-linear activation function e.g. Rectified Linear Unit (RELU)
activation <- function(x) max(0, x)

#' Layer by layer...input is transformed using both linear and non-linear function
β0 = 1 ; β = 2  # Arbitrary number
layer_1 <- function(x) activation( β0 + β * x )
layer_2 <- function(x) β0 + β * x
network <- function(x) layer_2(layer_1(x))

output <- network(123)  # Input = 123, output = 495
A simple architercture of two-layer deep learning network
Using conventional mathematical notation of deep learning
input <- c(1, 2, 3)  # Vector of size 3
#' Non-linear activation function e.g. Rectified Linear Unit (RELU)
activation <- function(x) sapply(x, max, 0)

#' Weights and biases in matrix form, just like β0 and β1, β2... of linear model
#' Here, input dimension = 3, intermediate dimension = 5, output dimension = 1
W1 <- matrix(runif(5*3), 5) ; b1 <- runif(5)
W2 <- matrix(runif(1*5), 1) ; b2 <- runif(1)

#' Simple matrix dot product of linear and non-linear models
layer_1 <- function(x) activation( W1 %*% x + b1 )  # Linear and non-linear
layer_2 <- function(x) W2 %*% x + b2                # Linear
network <- function(x) layer_2(layer_1(x))          # Multi-layer linear and non-linear

input  # [1] 1 2 3
#' Layer by layer...input is transformed using both linear and non-linear function
( output <- network(input) )  # [1] 10.311

#' Internally
( output_1 <- layer_1(input) )     # [1] 1.9659 3.0138 5.3722 3.0259 3.8878
( output   <- layer_2(output_1) )  # [1] 10.311