Skip to content

Instantly share code, notes, and snippets.

View alexeygrigorev's full-sized avatar
:octocat:
Githubbing

Alexey Grigorev alexeygrigorev

:octocat:
Githubbing
View GitHub Profile
@alexeygrigorev
alexeygrigorev / brushes.js
Last active August 29, 2015 14:16
syntaxhighlighter for itshared.org
/**
* SyntaxHighlighter
* http://alexgorbatchev.com/SyntaxHighlighter
*
* SyntaxHighlighter is donationware. If you are using it, please donate.
* http://alexgorbatchev.com/SyntaxHighlighter/donate.html
*
* @version
* 3.0.83 (July 02 2010)
*
@alexeygrigorev
alexeygrigorev / logo_dima.pdf
Last active September 26, 2015 17:33
IT4BI Master Thesis title at DIMA TU Berlin
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@alexeygrigorev
alexeygrigorev / RuleBasedPosTagger.java
Created July 14, 2015 10:17
Simple rule-based POS tagger for Russian (StanfordNLP & java)
package mlp.rus;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Lists;
@alexeygrigorev
alexeygrigorev / tensorflow-w2v-gd.py
Created March 21, 2016 16:04
Word2Vec with Tensorflow on GPU
graph = tf.Graph()
with graph.as_default():
with graph.device('/gpu:0'):
# input data
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
# variables
@alexeygrigorev
alexeygrigorev / bnp-variable-relations.dot
Last active March 27, 2016 11:02
Visualization of linear relationships between numeric variables for BNP Paribas kaggle competition
strict digraph G {
nodesep=1;
center=true; margin=1;
splines=true;
sep=1;
node [height="0.33", width="0.33", fixedsize=true];
edge [len=1.5];
v1 -> v130
v1 -> v131
@alexeygrigorev
alexeygrigorev / vimeo-download.py
Created September 17, 2016 09:09
Downloading segmented video from vimeo
import requests
import base64
from tqdm import tqdm
master_json_url = 'https://178skyfiregce-a.akamaihd.net/exp=1474107106~acl=%2F142089577%2F%2A~hmac=0d9becc441fc5385462d53bf59cf019c0184690862f49b414e9a2f1c5bafbe0d/142089577/video/426274424,426274425,426274423,426274422/master.json?base64_init=1'
base_url = master_json_url[:master_json_url.rfind('/', 0, -26) + 1]
resp = requests.get(master_json_url)
content = resp.json()
@alexeygrigorev
alexeygrigorev / CountVectorizer.java
Last active February 12, 2017 21:50
Count Vectorizer
import java.io.Serializable;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
import com.google.common.collect.Multiset.Entry;
import com.google.common.collect.Multisets;
@alexeygrigorev
alexeygrigorev / bi-kmeans.py
Last active November 16, 2017 20:24
Bisecting K-Means
import heapq
import numpy as np
from sklearn.cluster import KMeans, MiniBatchKMeans
def sklearn_bisecting_kmeans_lineage(X, k, verbose=0):
N, _ = X.shape
labels = np.zeros(N, dtype=np.int)
lineage = np.zeros((k, N), dtype=np.int)
@alexeygrigorev
alexeygrigorev / BeanToRecordConverter.java
Created January 17, 2018 13:45
Use reflection to write arbitrary java beans to parquet with Avro
package avro;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.reflect.ReflectData;
import java.util.ArrayList;
import java.util.List;
@alexeygrigorev
alexeygrigorev / mp_capture.py
Created July 16, 2018 08:35
Python stdout sharing between chind & parent processes
import sys
import time
from io import StringIO
import subprocess
from multiprocessing import Process, Pipe
from threading import Thread