gousiosg /
Last active Feb 4, 2020
Rebuild RAID array when one disk has been marked as faulty (but it is not really)
gousiosg / ml4se.bib
highlight -O rtf -s seashell -k Monaco -K 20 foo.rb |pbcopy
#!/usr/bin/env python
# (c) 2018 Georgios Gousios <>
# Barebones linear equation solving trainer
from __future__ import division
from random import randint
import codecs
import sys
gousiosg /
Last active Apr 30, 2018
Restoring the GHTorrent MongoDB database

This is a collection of scripts to restore a full GHTorrent MongoDB database from the dumps available at

To do the restore:

  1. Open a MongoDB terminal and run the createCollections.js script to create the necessary collections. You can block_compressor to either snappy or zlib to make your databases compressed. I am using none here, as I am using compression at the filesystem level.

  2. Run to restore the cummulative dumps. Wait 3-4 days.

digraph g {
graph [fontname = "helvetica"];
node [shape=record, fontname = "helvetica"];
edge [fontname = "helvetica"];
1 -> 95;
1 -> 10;
2 -> 78;
gousiosg /
Last active Nov 20, 2017
How compatible is your Unix with the original one?
#!/usr/bin/env bash
echo 0 0 > $TEMPFILE
curl ""|
grep "(I)"|
gousiosg /
Last active Sep 23, 2016
Experiments with various languages on low level file parsing

So today I was experimenting with various languages in order to make the GHTorrent MySQL "CSV" dumps to behave like RFC-compliant CSV files. This involved parsing multi-GB, UTF-8 encoded files and running a small state-machine at the character level. I started with Ruby, but it was slow:

$ time ruby csvify.rb projects.csv >/dev/null

real	0m36.714s
user	0m35.689s
# start the replset nodes
$ mongod --dbpath mongodb/ --replSet ghtorrent
$ mongod --dbpath mongodb-repl1/ --port 27018 --replSet ghtorrent
$ mongod --dbpath mongodb-repl2/ --port 27019 --replSet ghtorrent
# connect to primary
$ mongo
# In mongo shell
ghtorrent:PRIMARY> rs.initiate()
