Skip to content

Instantly share code, notes, and snippets.

@shilad
shilad / create_grading_prs.py
Created September 14, 2019 15:45
Python script to create grading pull requests. Requires Python3 and PyGithub.
#!/usr/bin/env python3 -O
#
# This script prepares to grade students homework submissions for a particular assignment.
# To do so, it creates a branch on each student's fork equivalent to the original assignment.
# It then creates a pull request against that branch.
#
# Author: Shilad Sen
#
import logging
@shilad
shilad / StringPermuter.java
Created February 10, 2018 20:08
An Iterator that generates all permutations of a string without keeping them in memory.
import java.util.*;
/**
* An Iterator that generates all permutations of a particular string, but
* does not need to store them in memory.
*
* Example usage:
*
* Iterator<String> iter = new StringPermuter("Shilad");
* while (iter.hasNext()) {
#!/usr/bin/env python2.7 -O
from gensim.models.doc2vec import LabeledSentence, Doc2Vec
import logging
import os
import random
import re
import sys
@shilad
shilad / eval_neighbors.py
Created December 2, 2017 03:12
Evaluate neighbors predictions for a word2vec embedding and testing dataset.
#!/usr/bin/env python3
#
# A script that evaluates the accuracy of a word2vec embedding at predicting pageviews within a session
# The vector model is expected to be in non-binary word2vec format as output by Mikolov's word2vec.c.
#
# The script calculates "hit-rates" for session prediction. For session prediction,
#
import annoy
import sys
#!/usr/bin/env bash
START_DATE="2017-09-01"
END_DATE="2017-10-01"
beg=${START_DATE}
while [ "$beg" != ${END_DATE} ]; do
end=$(date -I -d "$beg + 1 day")
@shilad
shilad / load-sitelinks.sh
Last active November 26, 2017 20:14
Load sitelinks from wikidata json file into Hive.
@shilad
shilad / customized.conf
Last active June 1, 2016 19:16
Basic customization for small vector model.
// Run this by running SRBuilder with program command line options -c customized.conf -m word2smallvec
baseDir : "."
dao.dataSource.default : h2
dao.dataSource.h2.url : "jdbc:h2:"${baseDir}"\/db\/h2;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0;MAX_OPERATION_MEMORY=100000000"
dao.dataSource.psql.url : "jdbc:postgresql:\/\/localhost\/wikibrain"
dao.dataSource.psql.username : ""
@shilad
shilad / RawAeronBenchmark.java
Last active December 9, 2015 18:16
Benchmark for Aeron messaging with variable numbers of channels
import uk.co.real_logic.aeron.Aeron;
import uk.co.real_logic.aeron.Publication;
import uk.co.real_logic.aeron.Subscription;
import uk.co.real_logic.aeron.driver.MediaDriver;
import uk.co.real_logic.aeron.driver.ThreadingMode;
import uk.co.real_logic.aeron.logbuffer.FragmentHandler;
import uk.co.real_logic.aeron.logbuffer.Header;
import uk.co.real_logic.agrona.DirectBuffer;
import uk.co.real_logic.agrona.concurrent.BusySpinIdleStrategy;
import uk.co.real_logic.agrona.concurrent.IdleStrategy;
@shilad
shilad / pom.xml
Last active August 29, 2015 14:02
Minimal pom.xml that uses WikiBrain
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>foo.bar</groupId>
<artifactId>minimal-pom</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
@shilad
shilad / CitationAnalyzer.java
Last active August 29, 2015 14:02
A program to extract geographic information and citations from Wikipedia
package org.wikibrain.cookbook.spatial;
import com.vividsolutions.jts.geom.Geometry;
import org.apache.commons.lang.StringUtils;
import org.wikibrain.conf.ConfigurationException;
import org.wikibrain.core.cmd.Env;
import org.wikibrain.core.cmd.EnvBuilder;
import org.wikibrain.core.dao.DaoException;
import org.wikibrain.core.dao.LocalPageDao;
import org.wikibrain.core.dao.RawPageDao;