Skip to content

Instantly share code, notes, and snippets.

@jwilk
Created August 22, 2018 15:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jwilk/37e329126f0919b8ddf4066e61c18831 to your computer and use it in GitHub Desktop.
Save jwilk/37e329126f0919b8ddf4066e61c18831 to your computer and use it in GitHub Desktop.

Source: https://johnhw.github.io/umap_primes/index.md.html

What do numbers look like?

One million integers embedded into 2D space with UMAP

This is the first million integers, represented as binary vectors indicating their prime factors, and laid out using the UMAP dimensionality reduction algorithm by Leland Mcinnes. Each integer is represented in a high-dimensional space, and gets squished down to 2D so that numbers with similar prime factorisations are closer together than those with dissimilar factorisations.

A very pretty structure emerges; this might be spurious in that it captures more about the layout algorithm than any "true" structure of numbers. However, the visual effect is very appealling and requires no tricky manipulation to create.

Large 4k version of the image suitable for backgrounds

Video

A short video, showing the numbers being added in sequence. Each frame adds 1000 integers, for a total of 1000 frames.

https://www.youtube.com/embed/nCk8dyU7zUM

Representation

Each integer is represented as a vector, one element for each possible prime factor, where 0=prime factor not present, 1=prime factor present. This provides a simple geometry for the integers, where each number can be seen as being composed of prime basis vectors.

For example, 10 might be represented as [1 0 1 0 0 0 0 ...] as it has factors 2 and 5 (the first and third primes respectively).

These high-dimensional vectors are then squashed down from thousands of dimensions for each number to just two dimensions, and a point plotted at that coordinate. The points are coloured according to the original integer index; brighter colours correspond to larger numbers. The points are laid out so that numbers with similar "angles" to each other are close together. For example, all primes are at 90 degrees to each other, and all powers of a number are identical to each other (0 degree angle).

Algorithm

  • for each i up to n

    • find prime factors of i.
    • For each i, make binary vector with π(n) columns (π(n) = number of primes less than n), where each element has 1=prime factor present, 0=absent.
  • Stack all vectors into a large, sparse matrix

  • Lay out with UMAP, using the cosine distance function (dot product).

I used li(n) as a cheap upper bound on π(n) to estimate the number of columns needed.

For example, for n=11, the matrix looks like:


             2  3  5  7  11       
            [0. 0. 0. 0. 0.] 0
            [0. 0. 0. 0. 0.] 1
            [1. 0. 0. 0. 0.] 2
            [0. 1. 0. 0. 0.] 3
            [1. 0. 0. 0. 0.] 4
            [0. 0. 1. 0. 0.] 5
            [1. 1. 0. 0. 0.] 6
            [0. 0. 0. 1. 0.] 7
            [1. 0. 0. 0. 0.] 8
            [0. 1. 0. 0. 0.] 9
            [1. 0. 1. 0. 0.] 10
            [0. 0. 0. 0. 1.] 11

The image above is reduced from a 1000000x78628 (!) binary matrix to 1000000x2. This only works because of the excellent sparse matrix support in UMAP.

What are we seeing?

The points can be filtered according to whatever criteria we might be interested in. For example, the parity of the numbers (even=green, odd=orange) Coloured by parity of the points

or alternatively by the whether the points are prime or not (white=prime, blue=composite). Coloured by primeness of the points

As might be expected, the primes form the vague, diffuse cluster near the centre, as they are all orthogonal to each other.

Extending this idea, we can colour the points by number of unique factors: Coloured by number of unique factors

We can colour the points according to the prime factors they contain, where the log of the factor is mapped to hue, with redder=smaller, bluer=larger factor. This generalises the parity plot above. This plot has just a touch of spatial noise added to try and separate out the colours more clearly, as each factor pass is rendered as a separate layer. Coloured by factors present, hue mapped to prime factor

Sequences

We can look at where common sequences of numbers lie. This isn't particularly informative, but there is perhaps some pattern.

For example, the Fibonacci sequence: Fibonacci sequence numbers marked in yellow

or the squares: Squares marked in yellow

The frontier

It's worth noting that any structure that is not an algorithmic artificat is really the frontier of numbers at some threshold (in this case, 1 million). Every possible finite binary vector corresponds to at least one integer in this representation (actually, infinitely many, as all repeated factors are contracted onto a single binary indicator). Therefore, in the infinite set of integers, all points are present, at every possible angle or distance from each other, and the space would be perfectly evenly filled. However, by cutting off at some specific n, we see the "surface" that we have reached by composing together primes such that their product is less than n.

Code

The python code is available.

Massive ~10k resolution version 16Mb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment