Skip to content

Instantly share code, notes, and snippets.

@tjfontaine
Last active December 19, 2015 18:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tjfontaine/00e04f628c0ef55e7cb8 to your computer and use it in GitHub Desktop.
Save tjfontaine/00e04f628c0ef55e7cb8 to your computer and use it in GitHub Desktop.

For a while people have been looking for some statistics about downloads from nodejs.org. So we started putting the access logs into Manta which allows us to iterate over our questions and quickly get those answers.

$ export MANTA_ASSET=/NodeCore/public/blog/2013-07-15/apache.js
$ export PARSER=/assets/$MANTA_ASSET
$ alias mstats="mjob create -qo -s $MANTA_ASSET"
$ mfind --type o /NodeCore/stor/nginx/access_log | sort | tail -30 > logfiles

Just as a quick primer on the environment. mfind and mjob are tools that come with the Manta SDK, they're used to interact with the service. apache.js is a small script to parse the access log and print the fields indicated on the command line. I made an alias with the common prologue needed for each invocation. And finally I cache the last 30 log files which I will reuse for each subsequent query.

Ok, first things first, what's our most popular platform?

$ cat logfiles | mstats -m "node $PARSER nodePlatform" 
    -r "sort | uniq -c | sort -n" 
    977 sunos
  55138 darwin
 102422 linux
 225413 win32
 718126 source

That's impressive, but is win32 really our most popular binary download? Let's check to see which userAgents are downloading the win32 binaries.

$ cat logfiles | mstats -m "node $PARSER nodePlatform userAgent | grep win32" \
    -r "sort | uniq -c | sort -n" | tail -5
   3490 win32 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
   4338 win32 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
  10302 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.2.9200.0)
  13128 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.1.7601 Service Pack 1)
 153373 win32 Download Master

So about 7.7k from browsers, 23k from Azure, and 153k from "Download Master"?! Well, that turns out to be a client that does multiple connections and range requests for the downloads, let's filter filter them out.

$ cat logfiles | mstats -m "node $PARSER nodePlatform userAgent |
    grep -v 'Download Master' | cut -f1" -r "sort | uniq -c | sort -n"
    977 sunos
  55138 darwin
  72040 win32
 102421 linux
 718119 source

Alright, that makes a bit more sense. Now what are our popular archs?

$ cat logfiles | mstats -m "node $PARSER nodePlatform nodeArch userAgent |
    grep -v 'Download Master' | cut -f1,2 | grep -v source" \
    -r "sort | uniq -c | sort -n"
     31 win32   x64
    433 sunos   x86
    544 sunos   x64
   1293 darwin  x86
   2686 linux   arm-pi
  11196 darwin  x64
  11669 linux   x86
  42649 darwin  both
  72009 win32   x86
  88066 linux   x64

How about popular versions? (Both binary and source downloads)

$ cat logfiles | mstats -m "node $PARSER nodeVersion userAgent |
    grep -v 'Download Master' | cut -f1" -r "sort | uniq -c | sort -n" | \
    tail -10
  27159 v0.10.5
  27190 v0.10.10
  31059 v0.8.14
  31070 v0.8.18
  34327 v0.6.20
  51092 v0.8.25
  54024 v0.10.13
  57871 v0.10.11
  69107 v0.8.22
 289688 v0.10.12

That mostly makes sense, v0.10.12 saw a pretty good hit rate, but v0.10.13 the latest stable has a nice position considering it's only 6 days old. But why do we still have 34k downloads of v0.6.20?

$ cat logfiles | mstats -m "node $PARSER nodeVersion nodePlatform userAgent |
    grep -v 'Download Master' | grep v0.6.20" -r "sort | uniq -c | sort -n" | \
    tail -5
    356 v0.6.20 source  curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5
    976 v0.6.20 win32 -
   9332 v0.6.20 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.2.9200.0)
  10236 v0.6.20 source  -
  11717 v0.6.20 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.1.7601 Service Pack 1)

Ahh, some Azure configuration is still installing v0.6.20 by default. I spoke with Glenn Block about this at NodeConf, that's supposed to be changing soon.

Ok, so how many source installs can actually be attributed to Homebrew?

$ cat logfiles | mstats -m "node $PARSER userAgent | grep 'Homebrew'" -r "wc -l"
34460

Ok, so we can adjust the popular platforms again.

    977 sunos
  72040 win32
  89598 darwin
 102421 linux
 683659 source

What are the popular versions with Homebrew?

$ cat logfiles | mstats -m "node $PARSER nodeVersion userAgent |
    grep 'Homebrew' | sed -E 's#(Homebrew).*#\1#'"
    -r "sort | uniq -c | sort -n" | tail -10
    176 v0.10.3   Homebrew
    227 v0.10.4   Homebrew
    303 v0.10.7   Homebrew
    310 v0.10.8   Homebrew
    315 v0.10.9   Homebrew
    639 v0.10.5   Homebrew
   1824 v0.10.10  Homebrew
   5592 v0.10.13  Homebrew
   5620 v0.10.11  Homebrew
  17295 v0.10.12  Homebrew

That seems encouraging, Homebrew users seem to be adopting our releases fairly quickly. Let's have a closer look at adoption rate, note that v0.10.13 was released on the 9th of July.

$ cat logfiles | mstats -m "node $PARSER date nodeVersion |
    egrep 'v0.10.1[123]'" -r "sort | uniq -c"
...
    544 2013-07-08  v0.10.11
  16854 2013-07-08  v0.10.12
    596 2013-07-09  v0.10.11
  16727 2013-07-09  v0.10.12
    957 2013-07-09  v0.10.13
    437 2013-07-10  v0.10.11
   7677 2013-07-10  v0.10.12
  13106 2013-07-10  v0.10.13
    541 2013-07-11  v0.10.11
   5455 2013-07-11  v0.10.12
  14583 2013-07-11  v0.10.13
    415 2013-07-12  v0.10.11
   5045 2013-07-12  v0.10.12
  13068 2013-07-12  v0.10.13
    306 2013-07-13  v0.10.11
   2371 2013-07-13  v0.10.12
  11390 2013-07-13  v0.10.13
      9 2013-07-14  v0.10.11
    344 2013-07-14  v0.10.12
    879 2013-07-14  v0.10.13

Excellent, a lot of our downloaders are indeed adopting our releases fairly quickly.

How about we chart that, aggr.js is just a small script that reformats the output into something easier to work with for gnuplot.

Warning: I don't have much gnuplot-fu.

$ export AGGAST=/NodeCore/public/blog/2013-07-15/aggr.js
$ export AGGR=/assets/$AGGAST
$ export GPASSET=/NodeCore/public/blog/2013-07-15/version.plt
$ export VERSIONS=/assets/$GPASSET
$ mget $GPASSET
set terminal svg;
set timefmt "%Y-%m-%d";
set xdata time;
plot "f" u 1:2 t "v0.10.11" w l, \
     "f" u 1:3 t "v0.10.12" w l, \
     "f" u 1:4 t "v0.10.13" w l;
$ cat logfiles | mstats -m "node $PARSER date nodeVersion |
    egrep 'v0.10.1[123]'" -s $AGGAST -s $GPASSET -r "sort | uniq -c |
    node $AGGR > f; gnuplot -p $VERSIONS |
    mpipe /NodeCore/public/blog/2013-07-15/versions.svg"

svg

It's important to remember that this is only data for users who are downloading from us directly. There are plenty of users who are getting their version from their platform instead, via package managers like apt, yum, or pkgsrc. There are also users building node from the git repository, and they're not represented here either.

Of course when it comes to which versions of node are still active, this is only part of the story. These are the versions of node being actively [re]deployed. Keep in mind there are users content with whatever ancient version they are currently running.

Moving forward I plan on automatically generating some of these reports and making them available to the public. Hopefully it will enable module authors to make informed decisions on what they need to support.

Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment