Skip to content

Instantly share code, notes, and snippets.

@roblanf
Last active March 16, 2016 22:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save roblanf/02704d5372fa82ce767e to your computer and use it in GitHub Desktop.
Save roblanf/02704d5372fa82ce767e to your computer and use it in GitHub Desktop.
This gist is an attempt to analyse the distribution of the last digits of prime numbers, inspired by this article: http://www.nature.com/news/peculiar-pattern-found-in-random-prime-numbers-1.19550?WT.mc_id=FBK_NatureNews
library(primes)
library(coda)
library(parallel)
library(ggplot2)
# set this to the number of processors on your machine
processors = 4
remainders <- function(divisor, numbers){
# get the remainders of a vector of numbers after dividing by divisor.
ld = numbers %% divisor
return(ld)
}
# get ~4m primes
p = generate_primes(min = 0, max = exp(18))
# if you want to check that the code is sensible, use randomised primes first...
#p = sample(p)
divisors = 3:10
# make a list of remainders with different divisors
p.r = mclapply(divisors, remainders, numbers = p, mc.cores = processors)
# make them into mcmc objects (a bit like a time series)
p.r.mcmc = mclapply(p.r, as.mcmc, mc.cores = processors)
# calculate autocorrelations at lags from 1:50K
p.r.ac = mclapply(p.r.mcmc, autocorr, lags = 1:50000, mc.cores = processors)
# now let's look at the cumulative sum of the autocorrelations
# the reason being that this should be roughly a random walk if
# nothing interesting is happening
p.r.ac.cumsum = lapply(p.r.ac, cumsum)
# now we just build all the data into a big dataframe for plotting
div = rep(divisors, each = length(p.r.ac[[1]]))
lags = rep(1:length(p.r.ac[[1]]), length(divisors))
dat = data.frame('divisor' = div, 'lag' = lags, 'autocorrelation' = unlist(p.r.ac), 'cumulative.ac' = unlist(p.r.ac.cumsum))
# and here are our plots: the same data two ways
ggplot(dat, aes(x = lag, y = cumulative.ac)) + geom_line() + facet_wrap(~divisor)
ggplot(dat, aes(x = lag, y = cumulative.ac, color = factor(divisor))) + geom_line(alpha = 0.5)
@roblanf
Copy link
Author

roblanf commented Mar 16, 2016

Here's a slightly longer description. The article (I didn't read the original, just the news and views) points out that primes with the same last number are not seen next to each other as often as you'd expect if they were randomly distributed.

The article has a quote from the authors of the original article which says:

“Every single person we’ve told this ends up writing their own computer program to check it for themselves,”

So I thought I'd see if I could do it. I don't know if this is right - I'm not a mathematician. But I can make computers do sums for me.

The approach I took was to first generate a bunch of primes, then to look at the remainders as a time series. Specifically, for a given divisor (e.g. 10, which means we're talking about last digits of the primes as in most of the news and views articles), we can easily generate the list of remainders. Given this list, we want to know if a '1' is followed by a '1' more or less often than we expect by chance.

I generalised this a little (assuming I haven't done something silly) by looking at autocorrelations. Specifically, this asks whether pairs of numbers at a given lag (e.g. neighbouring pairs have a lag of 1) are more or less similar than we would expect by chance. If they are more similar, the autocorrelation is positive (this would be the case if 1's were followed by other 1's more often than you'd expect by chance). If they are less similar (which is what the article suggests) the autocorrelation would be negative. The article discusses a lag of 1, but I wondered how many other lags this would apply to. How far away do primes have to be before their final digits (or remainders, more generally) are approximately randomly distributed.

I plot the cumulative sum of the autocorrelations, because this allows me to see easily when the autocorrelations become random. If the autocorrelations at different lags are roughly random, then the plot of the cumulative autocorrelation vs. the lag will level off. If they tend to be negative, the plot will keep going down.

The script above let's us check any number of divisors, and any number of lags. Here's what I get when I look at divisors from 5:20, and lags from 1:10K.

rplot01
rplot02

The plots show clearly not just that the remainders of neighbouring primes are more different than we'd expect by chance, but that the remainders of primes at quite large lags are also negative. In fact, the plots don't really level off until a lag of about 10,000.

My (possibly horribly wrong) interpretation of this is that even pairs of primes which are really very far apart (up to around 10,000 primes apart) have remainders that are more dissimilar than you'd expect by chance. That seems very odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment