Skip to content

Instantly share code, notes, and snippets.

View gousiosg's full-sized avatar

Georgios Gousios gousiosg

View GitHub Profile
@gousiosg
gousiosg / json2bson.rb
Created October 9, 2015 08:08
Convert JSON to BSON for importing into MongoDB
#!/usr/bin/env ruby
require 'json'
require 'bson' # Requires bson version > 3
json = JSON.parse(File.open(ARGV[0]).read)
w = File.open("#{ARGV[0]}.bson",'w')
json.each do |j|
@gousiosg
gousiosg / setup_bcache.sh
Created October 14, 2015 19:54
Setting up SSD as a cache for slow cloud volumes
#!/usr/bin/env bash
lsblk
apt-get install -y lvm2 mdadm bcache-tools
# Create a linux raid autodetect primary partition
# use the following keystrokes: np1tfdw
fdisk /dev/sdc
fdisk /dev/sdd
# setup raid
@gousiosg
gousiosg / analysis.R
Last active December 2, 2015 11:41
Complete machine learning example
library(reshape)
library(ggplot2)
library(corrplot)
library(caret)
source('ml.R')
# Util stuff
load.filter <- function(path) {
setAs("character", "POSIXct",
@gousiosg
gousiosg / RxPortScan.scala
Last active December 15, 2015 12:59
A reactive port scanner written using Rx.Java (0.13 onwards) and Scala 2.10 Futures
import java.net.Socket
import rx.subscriptions.Subscriptions
import rx.lang.scala.Observable
import scala.concurrent.{Future, future}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.{Failure, Success}
// A reactive parallel port scanner written with Rx.Java
// @author: @gousiosg, @headinthebox
@gousiosg
gousiosg / VanityPullRequests.cs
Last active December 15, 2015 16:39
Find Github pull requests that are then tweeted. Implemented with Rx.Net in C#.
/*
* Github commits that are then tweeted.
* by @gousiosg, @headinthebox
*/
void Main()
{
var q = from ghinfo in PullRequestersInfo()
from u in ghinfo
from t in Tweets(u).TakeUntil(Observable.Timer(TimeSpan.FromMinutes(5)))
select new {User = u, Tweet = t};
select a.month,
a.total_commits - b.commits_from_pull_reqs as direct,
b.commits_from_pull_reqs as pullreq
from
(select strftime("%Y-%m-01", substr(c.created_at, 0, 20)) as month,
p.id as prid, count(c.id) as total_commits
from commits c, projects p, project_commits pc
where p.id = pc.project_id
and c.id = pc.commit_id
group by month, p.id) as a,
@gousiosg
gousiosg / teapot.php
Created October 7, 2013 17:16
The teapot in PHP!
<?php
class Triangle
{
var $colors = array("yellowgreen", "tomato", "plum");
var $vertices;
function Triangle($vertices)
{
assert(sizeof($vertices) == 3);
# start the replset nodes
$ mongod --dbpath mongodb/ --replSet ghtorrent
$ mongod --dbpath mongodb-repl1/ --port 27018 --replSet ghtorrent
$ mongod --dbpath mongodb-repl2/ --port 27019 --replSet ghtorrent
# connect to primary
$ mongo
# In mongo shell
ghtorrent:PRIMARY> rs.initiate()
@gousiosg
gousiosg / README.md
Last active September 23, 2016 14:55
Experiments with various languages on low level file parsing

So today I was experimenting with various languages in order to make the GHTorrent MySQL "CSV" dumps to behave like RFC-compliant CSV files. This involved parsing multi-GB, UTF-8 encoded files and running a small state-machine at the character level. I started with Ruby, but it was slow:

$ time ruby csvify.rb projects.csv >/dev/null

real	0m36.714s
user	0m35.689s
@gousiosg
gousiosg / unix-compatible.sh
Last active November 20, 2017 10:17
How compatible is your Unix with the original one?
#!/usr/bin/env bash
TEMPFILE=/tmp/unixcount
exist=0
notexist=0
echo 0 0 > $TEMPFILE
curl "https://raw.githubusercontent.com/dspinellis/unix-v4man/master/man0/ptxx"|
grep "(I)"|