For a while people have been looking for some statistics about downloads from nodejs.org. So we started putting the access logs into Manta which allows us to iterate over our questions and quickly get those answers.
$ export MANTA_ASSET=/NodeCore/public/blog/2013-07-15/apache.js
$ export PARSER=/assets/$MANTA_ASSET
$ alias mstats="mjob create -qo -s $MANTA_ASSET"
$ mfind --type o /NodeCore/stor/nginx/access_log | sort | tail -30 > logfiles
Just as a quick primer on the environment. mfind
and mjob
are tools that
come with the Manta SDK, they're used to interact with the service.
apache.js
is a small script to parse the access log and print the fields indicated on the
command line. I made an alias with the common prologue needed for each
invocation. And finally I cache the last 30 log files which I will reuse for
each subsequent query.
Ok, first things first, what's our most popular platform?
$ cat logfiles | mstats -m "node $PARSER nodePlatform"
-r "sort | uniq -c | sort -n"
977 sunos
55138 darwin
102422 linux
225413 win32
718126 source
That's impressive, but is win32 really our most popular binary download? Let's
check to see which userAgent
s are downloading the win32 binaries.
$ cat logfiles | mstats -m "node $PARSER nodePlatform userAgent | grep win32" \
-r "sort | uniq -c | sort -n" | tail -5
3490 win32 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
4338 win32 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
10302 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.2.9200.0)
13128 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.1.7601 Service Pack 1)
153373 win32 Download Master
So about 7.7k from browsers, 23k from Azure, and 153k from "Download Master"?! Well, that turns out to be a client that does multiple connections and range requests for the downloads, let's filter filter them out.
$ cat logfiles | mstats -m "node $PARSER nodePlatform userAgent |
grep -v 'Download Master' | cut -f1" -r "sort | uniq -c | sort -n"
977 sunos
55138 darwin
72040 win32
102421 linux
718119 source
Alright, that makes a bit more sense. Now what are our popular archs?
$ cat logfiles | mstats -m "node $PARSER nodePlatform nodeArch userAgent |
grep -v 'Download Master' | cut -f1,2 | grep -v source" \
-r "sort | uniq -c | sort -n"
31 win32 x64
433 sunos x86
544 sunos x64
1293 darwin x86
2686 linux arm-pi
11196 darwin x64
11669 linux x86
42649 darwin both
72009 win32 x86
88066 linux x64
How about popular versions? (Both binary and source downloads)
$ cat logfiles | mstats -m "node $PARSER nodeVersion userAgent |
grep -v 'Download Master' | cut -f1" -r "sort | uniq -c | sort -n" | \
tail -10
27159 v0.10.5
27190 v0.10.10
31059 v0.8.14
31070 v0.8.18
34327 v0.6.20
51092 v0.8.25
54024 v0.10.13
57871 v0.10.11
69107 v0.8.22
289688 v0.10.12
That mostly makes sense, v0.10.12 saw a pretty good hit rate, but v0.10.13 the latest stable has a nice position considering it's only 6 days old. But why do we still have 34k downloads of v0.6.20?
$ cat logfiles | mstats -m "node $PARSER nodeVersion nodePlatform userAgent |
grep -v 'Download Master' | grep v0.6.20" -r "sort | uniq -c | sort -n" | \
tail -5
356 v0.6.20 source curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5
976 v0.6.20 win32 -
9332 v0.6.20 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.2.9200.0)
10236 v0.6.20 source -
11717 v0.6.20 win32 Platform-Installer/4.0.0.0(Microsoft Windows NT 6.1.7601 Service Pack 1)
Ahh, some Azure configuration is still installing v0.6.20 by default. I spoke with Glenn Block about this at NodeConf, that's supposed to be changing soon.
Ok, so how many source installs can actually be attributed to Homebrew?
$ cat logfiles | mstats -m "node $PARSER userAgent | grep 'Homebrew'" -r "wc -l"
34460
Ok, so we can adjust the popular platforms again.
977 sunos
72040 win32
89598 darwin
102421 linux
683659 source
What are the popular versions with Homebrew?
$ cat logfiles | mstats -m "node $PARSER nodeVersion userAgent |
grep 'Homebrew' | sed -E 's#(Homebrew).*#\1#'"
-r "sort | uniq -c | sort -n" | tail -10
176 v0.10.3 Homebrew
227 v0.10.4 Homebrew
303 v0.10.7 Homebrew
310 v0.10.8 Homebrew
315 v0.10.9 Homebrew
639 v0.10.5 Homebrew
1824 v0.10.10 Homebrew
5592 v0.10.13 Homebrew
5620 v0.10.11 Homebrew
17295 v0.10.12 Homebrew
That seems encouraging, Homebrew users seem to be adopting our releases fairly quickly. Let's have a closer look at adoption rate, note that v0.10.13 was released on the 9th of July.
$ cat logfiles | mstats -m "node $PARSER date nodeVersion |
egrep 'v0.10.1[123]'" -r "sort | uniq -c"
...
544 2013-07-08 v0.10.11
16854 2013-07-08 v0.10.12
596 2013-07-09 v0.10.11
16727 2013-07-09 v0.10.12
957 2013-07-09 v0.10.13
437 2013-07-10 v0.10.11
7677 2013-07-10 v0.10.12
13106 2013-07-10 v0.10.13
541 2013-07-11 v0.10.11
5455 2013-07-11 v0.10.12
14583 2013-07-11 v0.10.13
415 2013-07-12 v0.10.11
5045 2013-07-12 v0.10.12
13068 2013-07-12 v0.10.13
306 2013-07-13 v0.10.11
2371 2013-07-13 v0.10.12
11390 2013-07-13 v0.10.13
9 2013-07-14 v0.10.11
344 2013-07-14 v0.10.12
879 2013-07-14 v0.10.13
Excellent, a lot of our downloaders are indeed adopting our releases fairly quickly.
How about we chart that, aggr.js is just a small script that reformats the output into something easier to work with for gnuplot.
Warning: I don't have much gnuplot-fu.
$ export AGGAST=/NodeCore/public/blog/2013-07-15/aggr.js
$ export AGGR=/assets/$AGGAST
$ export GPASSET=/NodeCore/public/blog/2013-07-15/version.plt
$ export VERSIONS=/assets/$GPASSET
$ mget $GPASSET
set terminal svg;
set timefmt "%Y-%m-%d";
set xdata time;
plot "f" u 1:2 t "v0.10.11" w l, \
"f" u 1:3 t "v0.10.12" w l, \
"f" u 1:4 t "v0.10.13" w l;
$ cat logfiles | mstats -m "node $PARSER date nodeVersion |
egrep 'v0.10.1[123]'" -s $AGGAST -s $GPASSET -r "sort | uniq -c |
node $AGGR > f; gnuplot -p $VERSIONS |
mpipe /NodeCore/public/blog/2013-07-15/versions.svg"
It's important to remember that this is only data for users who are downloading from us directly. There are plenty of users who are getting their version from their platform instead, via package managers like apt, yum, or pkgsrc. There are also users building node from the git repository, and they're not represented here either.
Of course when it comes to which versions of node are still active, this is only part of the story. These are the versions of node being actively [re]deployed. Keep in mind there are users content with whatever ancient version they are currently running.
Moving forward I plan on automatically generating some of these reports and making them available to the public. Hopefully it will enable module authors to make informed decisions on what they need to support.
Stay tuned!