Skip to content

Instantly share code, notes, and snippets.

@RubenKelevra
Last active August 27, 2015 07:14
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RubenKelevra/3e73cf0ed4b05e4ba8cd to your computer and use it in GitHub Desktop.
Save RubenKelevra/3e73cf0ed4b05e4ba8cd to your computer and use it in GitHub Desktop.
Blog: RethinkDB for system-statistics of freifunk-nodes
This story is about: How do we fetch, process and display nice statistics for
freifunk-nodes, running a free, open WiFi-Network.
Else we want to implement pushes for the owner of each freifunk-node, when
something goes wrong or offline.
Status Quo:
Currently we use some hacky bash scripts[1] which generates a text-file, curl it
to a php running server over https and dropping the latest values to a
MySQL-database.
Means we do many updates on the database, but does not have any historical
informations. To fix this issue we do writing some rrds and parsing them with
perl, generating images for statistic-pages per freifunk-node[2].
Now, today we start something new:
We have setup a RethinkDB 2.1 cluster on 5 v-servers in 3 data centers, to max
out reliability and duration. We plan to add two different data centers with two
nodes more.
With the new raft-implementation, RethinkDB 2.1 "Forbidden Planet" has gotten
mature enough to use it in our environment, as we think.
Today we going to implement the backend for the (nearly finished) complete new,
written from the scratch bash-implementation of the statistic-scripts[3] on the
node. Which are fully functional, pseudo objective and useable as libary for
different use-cases.
The backend is going to be written in python, which accepts the bsdiff-binary
transmissions, secured over a fastd 1:1 tunnel, with latest
salsa2012-poly1305-umac public-private crypto. The diff is going to be
decompressed, and added to the RethinkDB-Database.
We hope to implement the rest of the calculation nearly/entirely as map&reduce
in the RethinkDB-Cluster, to spread out the workload on different servers and
locations.
We have currently an avarage of 1.5 transmissions per second, with a diff-size
of 300 KB, and a uncompressed JSON-filesize of 4 KB. Means we got nearly
4 Millions transmissions per month with an uncompressed filesize of nearly
15 GB. So since we getting more and more freifunk-nodes, we have to think about
a good scaling implementation, and we think we found it in a RethinkDB-Cluster.
Since 15 GB per month are a huge size for a database, and we do not need a 5
minutes resolution for times month ago, we we have to reduce the datasize each
hour, day, week, month and so on. We hope to implement the rest of the
calculation also entirely as map&reduce in the Cluster, since this would scale
best over more than one server.
I'm going to report updates here, so feel free to follow.
[1] https://github.com/VfN-NRW/legacy-stats/blob/master/gluon-legacy-https-stats/files/usr/sbin/ff-stats
[2] e.g. http://oldmap.vfn-nrw.de/nodes/24a43c43a8eb
[3] https://github.com/VfN-NRW/fnetstat/blob/complete_refactor/files/usr/sbin/fnetstat_stat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment