Instantly share code, notes, and snippets.

Created August 10, 2017 04:44
attempt to explain why broadcasting is bae
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 A starter example is centering a matrix ```{python} import numpy as np X = np.random.normal(size=(3, 3)) # random 3 by 3 matrix mu = X.mean(axis = 0) # array (vector) of column means print("Original matrix") print(X) print("\nArray of column means") print(mu) # array effectively copied till vector matches matrix in size print("\nBroadcast subtraction operation") print(X - mu) ``` To do this in R you have to loop or use something like map/apply. This lets you create kernel matrices in a fast and fairly readable way, for example. ```{python} import numpy as np X = np.random.normal(size=(10, 5)) Y = np.random.normal(size=(2, 5)) sigma = 0.1 # calculate all pairwise distances between rows in X and Y # (x - y)^2 = a^2 - 2 ab + b^2 dist_sq = np.square(X).sum(axis=1)[:, np.newaxis] - 2.0 * np.dot(X, Y.T) + np.square(Y).sum(axis=1) # turn this into an RBF kernel (math might be off, but hopefully the point is clear) rbf = np.exp(-np.sqrt(dist_sq) / (2 * sigma ** 2)) print(dist_sq) print(rbf) ``` In R you have to hope someone vectorized the components of the operation that you care about, copy an object along an appropriate dimension, or use loops/applys/maps, which I think make the math harder to think about.

### DavisVaughan commented Aug 10, 2017 • edited

I had way too much fun with this. I'm starting to appreciate broadcasting, so I took a stab at how you might solve these two specific problems with new functions that attempt to mimic basic broadcasting for subtraction and addition (%-% and %+%).

The first bit it me exploring what R does natively, then I create the functions, then test a few things!

I am completely aware the functions would need to be dramatically improved, but I think its kind of neat.

FYI - the most important bit is the Pairwise Distance chunk down at the bottom so you may want to just skip to that first.

```# GENERAL EXPLORATION --------------------------------------------------

# Random 3x3 matrix
X <- matrix(c(-1.07230736, -0.13967851,  0.88094962,
0.69717778, -0.11573738, -0.57242203,
0.1390293 , -1.77457788,  1.45084404),
nrow = 3, byrow = TRUE)

# Col means are returned as a vector, not a matrix
mu <- colMeans(X)

mu
#>  -0.07870009 -0.67666459  0.58645721

# Normal subtraction does this wrong
# The first element of mu is subtracted from the entire first row of X
# Second element of mu subtracted from the second row of X and so on
X - mu
#>            [,1]        [,2]      [,3]
#> [1,] -0.9936073 -0.06097842 0.9596497
#> [2,]  1.3738424  0.56092721 0.1042426
#> [3,] -0.4474279 -2.36103509 0.8643868

# You could exploit that fact like this, but its kinda ugly
t(t(X) - mu)
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# Or tell R to make a matrix of mu. Telling it 3x3 will extend it automatically
# But this is obviously making copies to extend it :(
X - matrix(mu, nrow=3, ncol=3, byrow = TRUE)
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# R isnt "smart" enough to know how to broadcast, and just throws an error
# when you have non conformable arrays
mu <- matrix(mu, nrow = 1)
X - mu
#> Error in X - mu: non-conformable arrays

# Loop to prevent copying
# This is interesting. Generalize this?
X_mu <- matrix(nrow = 3, ncol = 3)
for(i in 1:nrow(X)) {
X_mu[i,] <- X[i,] - mu
}

X_mu
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

`%-%` <- function(e1, e2) {

# I know this won't be generalizable
# But it sets up the empty matrix to be filled with the result
dim_row <- max(nrow(e1), nrow(e2))
dim_col <- max(ncol(e1), ncol(e2))
x <- matrix(nrow = dim_row, ncol = dim_col)

# e1 - e2 & 1 has more rows & equal cols
if(nrow(e1) >= nrow(e2) & ncol(e1) == ncol(e2)) {
case <- "1"

# e1 - e2 & 2 has more rows & equal cols
} else if(nrow(e1) < nrow(e2) & ncol(e1) == ncol(e2)) {
case <- "2"

# e1 - e2 & 1 has more cols & equal rows
} else if(ncol(e1) >= ncol(e2) & nrow(e1) == nrow(e2)) {
case <- "3"

# e1 - e2 & 2 has more cols & equal rows
} else if(ncol(e1) < ncol(e2) & nrow(e1) == nrow(e2)) {
case <- "4"

# Fail
} else {
stop("Incorrect dims")
}

switch(case,
"1" = for(i in 1:dim_row) { x[i,] <- e1[i,] - e2     },
"2" = for(i in 1:dim_row) { x[i,] <- e1     - e2[i,] },
"3" = for(i in 1:dim_col) { x[,i] <- e1[,i] - e2     },
"4" = for(i in 1:dim_col) { x[,i] <- e1     - e2[,i] })

x
}

`%+%` <- function(e1, e2) {

# I know this won't be generalizable
# But it sets up the empty matrix to be filled with the result
dim_row <- max(nrow(e1), nrow(e2))
dim_col <- max(ncol(e1), ncol(e2))
x <- matrix(nrow = dim_row, ncol = dim_col)

# e1 - e2 & 1 has more rows & equal cols
if(nrow(e1) >= nrow(e2) & ncol(e1) == ncol(e2)) {
case <- "1"

# e1 - e2 & 2 has more rows & equal cols
} else if(nrow(e1) < nrow(e2) & ncol(e1) == ncol(e2)) {
case <- "2"

# e1 - e2 & 1 has more cols & equal rows
} else if(ncol(e1) >= ncol(e2) & nrow(e1) == nrow(e2)) {
case <- "3"

# e1 - e2 & 2 has more cols & equal rows
} else if(ncol(e1) < ncol(e2) & nrow(e1) == nrow(e2)) {
case <- "4"

# Fail
} else {
stop("Incorrect dims")
}

switch(case,
"1" = for(i in 1:dim_row) { x[i,] <- e1[i,] + e2     },
"2" = for(i in 1:dim_row) { x[i,] <- e1     + e2[i,] },
"3" = for(i in 1:dim_col) { x[,i] <- e1[,i] + e2     },
"4" = for(i in 1:dim_col) { x[,i] <- e1     + e2[,i] })

x
}

# TESTING -------------------------------------------------------------------

# Now try it
X %-% mu
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# This is working like we want
# Note that mu is a matrix, not a vector when we pass it to %-%

# PAIRWISE DISTANCE ---------------------------------------------------------

X <- matrix(c(1.2547115 , -0.48876422, -0.72856604,  0.71314144, -0.39458829,
1.24672356,  0.5546855 , -0.85967337, -1.44113557, -1.88939305,
2.3343547 ,  0.87549444, -0.84078801, -0.0126508 ,  0.65216383,
-0.4186715 ,  0.28402457, -0.30994525, -0.84642535,  0.67971748,
0.17122195,  0.24889469,  1.06096652,  0.15103784, -0.61650479,
-0.55185275,  1.22362894, -1.13346561, -1.78699116, -0.80449837,
-1.63571732,  0.96625735, -0.229193  , -1.04050556,  0.53947026,
-0.38108634, -1.86069407, -0.65388136, -0.14660911, -1.31124108,
-0.29728826,  0.03291303,  1.97648715,  0.70430292,  0.71027875,
-2.73208985,  0.54960805,  0.56568121, -1.04819229,  0.33661375),
nrow = 10, ncol = 5, byrow = TRUE)

Y <- matrix(c(-1.96300547,  1.87600543, -0.45271843, -0.20048695, -0.18477972,
-0.18735322, -2.27866683, -1.00575347, -0.05794885, -0.95420218),
nrow = 2, ncol = 5, byrow = TRUE)

matrix(rowSums(X^2)) %-% (2 * (X %*% t(Y))) %+% matrix(rowSums(Y^2), nrow = 1)
#>            [,1]       [,2]
#>  [1,] 16.900666  6.2678828
#>  [2,] 16.658775 12.8935885
#>  [3,] 20.354682 18.9174214
#>  [4,]  6.104346 10.3964331
#>  [5,]  9.803614 10.9461899
#>  [6,]  5.781411 15.4272445
#>  [7,]  2.214892 16.4268127
#>  [8,] 17.777676  0.4713851
#>  [9,] 13.692418 17.6007708
#> [10,]  4.378414 19.5910181```

### alexpghayes commented Oct 18, 2017 • edited

Just discovered this, don't know how I hadn't seen your comment yet -- super cool!

FYI if you dig broadcasting, the whole point of `xtensor` is to provide the same Numpy interface and broadcasting niceness from within Rcpp.

Ugh if only that played better with windows.

### kanishkamisra commented Jun 19, 2019

So this is how `rray` came to be hmm 🤔