Skip to content

Instantly share code, notes, and snippets.

View Gaarv's full-sized avatar
:atom:

Gaarv Gaarv

:atom:
  • Oslo, Norway
View GitHub Profile
@Gaarv
Gaarv / spark_parallel_boost.py
Last active November 24, 2016 14:43 — forked from wpm/spark_parallel_boost.py
A simple example of how to integrate the Spark parallel computing framework and the scikit-learn machine learning toolkit. This script randomly generates test and train data sets, trains an ensemble of decision trees using boosting, and applies the ensemble to the test set. The ensemble training is done in parallel.
from pyspark import SparkContext
import numpy as np
from sklearn.model_selection import train_test_split, ShuffleSplit
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
def run(sc):
@Gaarv
Gaarv / gist:cff71de21bf410d80aff1c032aa6caf9
Created April 5, 2017 09:09 — forked from piotrga/gist:1520363
Matrix multiplication with parallel collections
override def multiply(m1: Array[Array[Double]], m2: Array[Array[Double]]) : Array[Array[Double]] = {
val res = Array.ofDim[Double](m1.length, m2(0).length)
val M1_COLS = m1(0).length
val M1_ROWS = m1.length
val M2_COLS = m2(0).length
@inline def singleThreadedMultiplicationFAST(start_row:Int, end_row:Int) {
var col, i = 0
var sum = 0.0
var row = start_row
import numpy as np
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import six
class MarisaCountVectorizer(CountVectorizer):
# ``CountVectorizer.fit`` method calls ``fit_transform`` so
# ``fit`` is not provided
def fit_transform(self, raw_documents, y=None):
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
# hack to store vocabulary in MARISA Trie
class _MarisaVocabularyMixin(object):
def fit_transform(self, raw_documents, y=None):
super(_MarisaVocabularyMixin, self).fit_transform(raw_documents)
self._freeze_vocabulary()
@Gaarv
Gaarv / keras_gensim_embeddings.py
Created August 22, 2017 15:09 — forked from codekansas/keras_gensim_embeddings.py
Using Word2Vec embeddings in Keras models
from __future__ import print_function
import json
import os
import numpy as np
from gensim.models import Word2Vec
from gensim.utils import simple_preprocess
from keras.engine import Input
from keras.layers import Embedding, merge
@Gaarv
Gaarv / attention_lstm.py
Created September 8, 2017 15:38 — forked from mbollmann/attention_lstm.py
My attempt at creating an LSTM with attention in Keras
class AttentionLSTM(LSTM):
"""LSTM with attention mechanism
This is an LSTM incorporating an attention mechanism into its hidden states.
Currently, the context vector calculated from the attended vector is fed
into the model's internal states, closely following the model by Xu et al.
(2016, Sec. 3.1.2), using a soft attention model following
Bahdanau et al. (2014).
The layer expects two inputs instead of the usual one:
@Gaarv
Gaarv / residual_network.py
Created September 18, 2017 08:56 — forked from justiceamoh/residual_network.py
Clean and simple Keras implementation of residual networks (ResNeXt and ResNet) accompanying accompanying Deep Residual Learning: https://blog.waya.ai/deep-residual-learning-9610bb62c355.
"""
Clean and simple Keras implementation of network architectures described in:
- (ResNet-50) [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf).
- (ResNeXt-50 32x4d) [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf).
Python 3.
"""
def residual_network(x):
@Gaarv
Gaarv / Graph
Created January 8, 2018 12:51 — forked from printminion/Graph
Status: not working
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
# 1 - change in submit.py from:
def load_input_data(file_location):
with open(file_location, 'r') as input_data_file:
input_data = ''.join(input_data_file.readlines())
return input_data
# to:
def load_input_data(file_location):
return file_location
@Gaarv
Gaarv / serialization.sc
Created December 28, 2018 09:10 — forked from laughedelic/serialization.sc
Shows how to serialize-deserialize an object in Scala to a String
import java.io._
import java.util.Base64
import java.nio.charset.StandardCharsets.UTF_8
def serialise(value: Any): String = {
val stream: ByteArrayOutputStream = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(stream)
oos.writeObject(value)
oos.close
new String(