Skip to content

Instantly share code, notes, and snippets.

@alexpghayes
Created August 10, 2017 04:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexpghayes/286c2ff2eced494cf96763f19ff408ec to your computer and use it in GitHub Desktop.
Save alexpghayes/286c2ff2eced494cf96763f19ff408ec to your computer and use it in GitHub Desktop.
attempt to explain why broadcasting is bae
A starter example is centering a matrix
```{python}
import numpy as np
X = np.random.normal(size=(3, 3)) # random 3 by 3 matrix
mu = X.mean(axis = 0) # array (vector) of column means
print("Original matrix")
print(X)
print("\nArray of column means")
print(mu)
# array effectively copied till vector matches matrix in size
print("\nBroadcast subtraction operation")
print(X - mu)
```
To do this in R you have to loop or use something like map/apply.
This lets you create kernel matrices in a fast and fairly readable way, for example.
```{python}
import numpy as np
X = np.random.normal(size=(10, 5))
Y = np.random.normal(size=(2, 5))
sigma = 0.1
# calculate all pairwise distances between rows in X and Y
# (x - y)^2 = a^2 - 2 ab + b^2
dist_sq = np.square(X).sum(axis=1)[:, np.newaxis] - 2.0 * np.dot(X, Y.T) + np.square(Y).sum(axis=1)
# turn this into an RBF kernel (math might be off, but hopefully the point is clear)
rbf = np.exp(-np.sqrt(dist_sq) / (2 * sigma ** 2))
print(dist_sq)
print(rbf)
```
In R you have to hope someone vectorized the components of the operation that
you care about, copy an object along an appropriate dimension, or use
loops/applys/maps, which I think make the math harder to think about.
@DavisVaughan
Copy link

DavisVaughan commented Aug 10, 2017

I had way too much fun with this. I'm starting to appreciate broadcasting, so I took a stab at how you might solve these two specific problems with new functions that attempt to mimic basic broadcasting for subtraction and addition (%-% and %+%).

The first bit it me exploring what R does natively, then I create the functions, then test a few things!

I am completely aware the functions would need to be dramatically improved, but I think its kind of neat.

FYI - the most important bit is the Pairwise Distance chunk down at the bottom so you may want to just skip to that first.

# GENERAL EXPLORATION --------------------------------------------------

# Random 3x3 matrix
X <- matrix(c(-1.07230736, -0.13967851,  0.88094962,
               0.69717778, -0.11573738, -0.57242203,
               0.1390293 , -1.77457788,  1.45084404),
            nrow = 3, byrow = TRUE)

# Col means are returned as a vector, not a matrix
mu <- colMeans(X)

mu
#> [1] -0.07870009 -0.67666459  0.58645721

# Normal subtraction does this wrong
# The first element of mu is subtracted from the entire first row of X
# Second element of mu subtracted from the second row of X and so on
X - mu
#>            [,1]        [,2]      [,3]
#> [1,] -0.9936073 -0.06097842 0.9596497
#> [2,]  1.3738424  0.56092721 0.1042426
#> [3,] -0.4474279 -2.36103509 0.8643868

# You could exploit that fact like this, but its kinda ugly
t(t(X) - mu)
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# Or tell R to make a matrix of mu. Telling it 3x3 will extend it automatically
# But this is obviously making copies to extend it :(
X - matrix(mu, nrow=3, ncol=3, byrow = TRUE)
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# R isnt "smart" enough to know how to broadcast, and just throws an error
# when you have non conformable arrays
mu <- matrix(mu, nrow = 1)
X - mu
#> Error in X - mu: non-conformable arrays

# Loop to prevent copying
# This is interesting. Generalize this?
X_mu <- matrix(nrow = 3, ncol = 3)
for(i in 1:nrow(X)) {
  X_mu[i,] <- X[i,] - mu 
}

X_mu
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# BROADCAST FUNCTIONS? -------------------------------------------------------------------

# New broadcast subtraction function?
`%-%` <- function(e1, e2) {
  
  # I know this won't be generalizable
  # But it sets up the empty matrix to be filled with the result
  dim_row <- max(nrow(e1), nrow(e2))
  dim_col <- max(ncol(e1), ncol(e2))
  x <- matrix(nrow = dim_row, ncol = dim_col)
  
  # e1 - e2 & 1 has more rows & equal cols
  if(nrow(e1) >= nrow(e2) & ncol(e1) == ncol(e2)) {
    case <- "1"
    
  # e1 - e2 & 2 has more rows & equal cols
  } else if(nrow(e1) < nrow(e2) & ncol(e1) == ncol(e2)) {
    case <- "2"
    
  # e1 - e2 & 1 has more cols & equal rows
  } else if(ncol(e1) >= ncol(e2) & nrow(e1) == nrow(e2)) {
    case <- "3"
    
  # e1 - e2 & 2 has more cols & equal rows
  } else if(ncol(e1) < ncol(e2) & nrow(e1) == nrow(e2)) {
    case <- "4"
    
  # Fail
  } else {
    stop("Incorrect dims")
  }
  
  switch(case,
         "1" = for(i in 1:dim_row) { x[i,] <- e1[i,] - e2     },
         "2" = for(i in 1:dim_row) { x[i,] <- e1     - e2[i,] },
         "3" = for(i in 1:dim_col) { x[,i] <- e1[,i] - e2     },
         "4" = for(i in 1:dim_col) { x[,i] <- e1     - e2[,i] })
  
  x
}

# New broadcast addition function?
`%+%` <- function(e1, e2) {
  
  # I know this won't be generalizable
  # But it sets up the empty matrix to be filled with the result
  dim_row <- max(nrow(e1), nrow(e2))
  dim_col <- max(ncol(e1), ncol(e2))
  x <- matrix(nrow = dim_row, ncol = dim_col)
  
  # e1 - e2 & 1 has more rows & equal cols
  if(nrow(e1) >= nrow(e2) & ncol(e1) == ncol(e2)) {
    case <- "1"
    
    # e1 - e2 & 2 has more rows & equal cols
  } else if(nrow(e1) < nrow(e2) & ncol(e1) == ncol(e2)) {
    case <- "2"
    
    # e1 - e2 & 1 has more cols & equal rows
  } else if(ncol(e1) >= ncol(e2) & nrow(e1) == nrow(e2)) {
    case <- "3"
    
    # e1 - e2 & 2 has more cols & equal rows
  } else if(ncol(e1) < ncol(e2) & nrow(e1) == nrow(e2)) {
    case <- "4"
    
    # Fail
  } else {
    stop("Incorrect dims")
  }
  
  switch(case,
         "1" = for(i in 1:dim_row) { x[i,] <- e1[i,] + e2     },
         "2" = for(i in 1:dim_row) { x[i,] <- e1     + e2[i,] },
         "3" = for(i in 1:dim_col) { x[,i] <- e1[,i] + e2     },
         "4" = for(i in 1:dim_col) { x[,i] <- e1     + e2[,i] })
  
  x
}

# TESTING -------------------------------------------------------------------

# Now try it
X %-% mu
#>            [,1]       [,2]       [,3]
#> [1,] -0.9936073  0.5369861  0.2944924
#> [2,]  0.7758779  0.5609272 -1.1588792
#> [3,]  0.2177294 -1.0979133  0.8643868

# This is working like we want
# Note that mu is a matrix, not a vector when we pass it to %-%

# PAIRWISE DISTANCE ---------------------------------------------------------

X <- matrix(c(1.2547115 , -0.48876422, -0.72856604,  0.71314144, -0.39458829,
              1.24672356,  0.5546855 , -0.85967337, -1.44113557, -1.88939305,
              2.3343547 ,  0.87549444, -0.84078801, -0.0126508 ,  0.65216383,
             -0.4186715 ,  0.28402457, -0.30994525, -0.84642535,  0.67971748,
              0.17122195,  0.24889469,  1.06096652,  0.15103784, -0.61650479,
             -0.55185275,  1.22362894, -1.13346561, -1.78699116, -0.80449837,
             -1.63571732,  0.96625735, -0.229193  , -1.04050556,  0.53947026,
             -0.38108634, -1.86069407, -0.65388136, -0.14660911, -1.31124108,
             -0.29728826,  0.03291303,  1.97648715,  0.70430292,  0.71027875,
             -2.73208985,  0.54960805,  0.56568121, -1.04819229,  0.33661375),
             nrow = 10, ncol = 5, byrow = TRUE)

Y <- matrix(c(-1.96300547,  1.87600543, -0.45271843, -0.20048695, -0.18477972,
              -0.18735322, -2.27866683, -1.00575347, -0.05794885, -0.95420218),
              nrow = 2, ncol = 5, byrow = TRUE)

# Broadcast subtraction and addition to do pairwise distance!
matrix(rowSums(X^2)) %-% (2 * (X %*% t(Y))) %+% matrix(rowSums(Y^2), nrow = 1)
#>            [,1]       [,2]
#>  [1,] 16.900666  6.2678828
#>  [2,] 16.658775 12.8935885
#>  [3,] 20.354682 18.9174214
#>  [4,]  6.104346 10.3964331
#>  [5,]  9.803614 10.9461899
#>  [6,]  5.781411 15.4272445
#>  [7,]  2.214892 16.4268127
#>  [8,] 17.777676  0.4713851
#>  [9,] 13.692418 17.6007708
#> [10,]  4.378414 19.5910181

@alexpghayes
Copy link
Author

alexpghayes commented Oct 18, 2017

Just discovered this, don't know how I hadn't seen your comment yet -- super cool!

FYI if you dig broadcasting, the whole point of xtensor is to provide the same Numpy interface and broadcasting niceness from within Rcpp.

Ugh if only that played better with windows.

@kanishkamisra
Copy link

So this is how rray came to be hmm 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment