Skip to content

Instantly share code, notes, and snippets.

@mazieres
mazieres / unpack_stack.sh
Created December 9, 2013 22:57
Onliner to unpack stackexchange datadump
for each in `ls . | grep -v "stackoverflow.com.7z.0"` ; do 7z e $each -o`echo $each | cut -d "." -f -2` ; done && 7z e stackoverflow.com.7z.001 -ostackoverflow
@mazieres
mazieres / clean_github_archive.sh
Created December 18, 2013 09:59
clean Github archive files that don't have one entry per line.
#!/bin/bash
for f in `ls . | grep json.gz`
do
if [ `zcat $f | wc -l` == 0 ] ; then
gunzip $f
for unf in `ls *.json`
do
echo $unf
sed -i 's/}{/}\n{/g' $unf
@mazieres
mazieres / funnyPlaces.md
Created February 4, 2014 14:06
Funny 2013 Github Profiles Locations

Funny 2013 Github Profiles Locations

On a Github profile, one can inform his location, such as "San Fransisco, CA" or "Paris, France". Some profiles use this feature to freely describe where they feel they are. Here's a list of funny locations found on github users profiles, ranked by the number of users claiming it.

Data was extracted from Github Archive. for 2013 only.

Their is numerous false positives.

by @mazieres.

@mazieres
mazieres / myPCA.py
Last active November 15, 2021 03:04
# coding: utf-8
"""Generic usage of Principal Component Analysis"""
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import numpy as np
@mazieres
mazieres / osetg
Created October 3, 2014 08:55
Ordered set python generator
def osetg(seq, idfun=None):
# Ordered set generator
# <http://www.peterbe.com/plog/uniqifiers-benchmark>
if idfun is None:
def idfun(x): return x
seen = {}
for item in seq:
marker = idfun(item)
if marker in seen: continue
seen[marker] = 1
@mazieres
mazieres / osetg
Created October 3, 2014 08:55
Ordered set python generator
def osetg(seq, idfun=None):
# Ordered set generator
# <http://www.peterbe.com/plog/uniqifiers-benchmark>
if idfun is None:
def idfun(x): return x
seen = {}
for item in seq:
marker = idfun(item)
if marker in seen: continue
seen[marker] = 1
@mazieres
mazieres / osetg.py
Created October 3, 2014 08:56
Ordered set python generator
def osetg(seq, idfun=None):
# Ordered set generator
# <http://www.peterbe.com/plog/uniqifiers-benchmark>
if idfun is None:
def idfun(x): return x
seen = {}
for item in seq:
marker = idfun(item)
if marker in seen: continue
seen[marker] = 1
#!/usr/bin/env python
# by @mazieres for cortext.fr
import sqlite3
import sys
import os
from collections import defaultdict
# PATH to the DB downloaded from cortext
import pandas as pd
import numpy as np
def wannabe_projection(df):
'''
https://stats.stackexchange.com/questions/142132/is-this-a-valid-method-for-unipartite-projection-of-a-bipartite-graph
'''
n_samples = df.shape[0]
res = np.zeros((n_samples, n_samples))
import unittest
class TestExtract(unittest.TestCase):
def test_adjacency_matrix(self):
X = np.array([
[1, 8, 3],
[5, 0, 0],
[0, 4, 2]])
tested = adjacency_matrix(X)