Skip to content

Instantly share code, notes, and snippets.

@mikmart
Last active March 10, 2018 21:13
Show Gist options
  • Save mikmart/85a5118285b1f8e4361399a1a3cb3034 to your computer and use it in GitHub Desktop.
Save mikmart/85a5118285b1f8e4361399a1a3cb3034 to your computer and use it in GitHub Desktop.
Calculating summaries from arrays without loops
set.seed(1)
n <- 1000
matrices <- replicate(n, matrix(runif(24 ^ 2), nrow = 24))
str(matrices)
#> num [1:24, 1:24, 1:1000] 0.266 0.372 0.573 0.908 0.202 ...
f <- function(x) {
sum_x <- sum(x)
if (sum_x == 0)
return(0)
p <- x / sum_x
p[p == 0] <- 1
-sum(p * log2(p))
}
g <- function(x, i) {
apply(x, i, sum) / sum(x) * 2 ^ apply(x, i, f)
}
result <- apply(matrices, 3, g, 2)
str(result)
#> num [1:24, 1:1000] 0.926 0.867 0.849 0.89 0.853 ...
#' Created on 2018-03-06 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
@mikmart
Copy link
Author

mikmart commented Mar 6, 2018

Based on discussion in this StackOverflow question.

@shhashha
Copy link

shhashha commented Mar 9, 2018

I see... wow. Thanks! God exists!
I have applied it to real data and get some NaNs. I have track back the error and It was due to some columns with sum = 0. By applying runif you never get a 0 value, but our data are randomly generated by a Poisson distribution (to test aganist some null models) and sometimes get colSums = 0. These NaNs make final calculations to populate with NaNs. I have solved it with:

result <- apply(matrices, 3, g, 2)
result[is.na(result)] <- 0

And everything is alright.... and:
Previously to this late modification of the function to deal with p == 0, I tried just to replace in the raw data, every 0 with 1e-30. Comparing the results, they are exactly the same. But this way is mathematically correct.
Many thanks again, wow I'm really impressed. I have learned a lot.
JL (shasha as stackoverflow user)

@mikmart
Copy link
Author

mikmart commented Mar 10, 2018

Ah, yeah I see. Like you've done, just replacing the NAs in result with zeroes will produce the correct result; but still, it's probably cleaner to just deal with that in the f function, too, so that we never divide by zero. If we add a check to see if sum(x) == 0 in f and return 0 if so, we get the same result -- and don't have to do any clean up after!

I'm happy to help! It's been interesting refreshing my array manipulations, since I normally just work with data frames and lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment