Skip to content

Instantly share code, notes, and snippets.

View khellan's full-sized avatar

Knut Hellan khellan

View GitHub Profile
@khellan
khellan / leak.rb
Created June 29, 2011 18:09
Yajl parsing memory leak with Rubinius pre2.0.0
!/usr/bin/env ruby
require "rubygems"
require "yajl"
input = ARGV.shift
count = 0
Yajl::Parser.parse(open(input)) {|line| count += 1}
puts count
@khellan
khellan / doc.json
Created August 5, 2011 11:09
Test document for memory leak
{"id":"something","type":"test","category":"test","title":"a dummy document created for a test","description":"in certain settings, ruby with yajl leaks memory constantly. In order to reproduce this error a long input file with json needs to be used. multiples of this object should be able to do the job"}
@khellan
khellan / test.erl
Created September 19, 2011 07:56
no_auto_import directive test
-module(test).
-compile({no_auto_import,[min/2]}).
-export([test/2]).
test(A,B) ->
min(A,B).
min(A,B) when A<B -> A;
@khellan
khellan / gobbler.erl
Created May 15, 2012 06:40
Stepwise introduction to a distributed erlang message loop
-module(gobbler).
-behaviour(gen_server).
-export([code_change/3, handle_call/3, handle_cast/2, handle_info/2]).
-export([init/1, start_link/0, terminate/2]).
-export([count/0, increment/0, stop/0]).
count() -> gen_server:call(?MODULE, count).
@khellan
khellan / JRuby 1.6.7 double resume
Created June 7, 2012 15:19
Double resume in JRuby. Note that the result in JRuby varies so it seems to be time sensitive.
ruby -v
jruby 1.6.7 (ruby-1.9.2-p312) (2012-02-22 3e82bc8) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_01) [linux-amd64-java]
ruby test/double_resume.rb
Loaded suite test/double_resume
Started
E
Finished in 0.157000 seconds.
1) Error:
test_0001_should_raise_double_resume(ResumingFiberSpec):
@khellan
khellan / word2vec.py
Created November 30, 2015 08:04
TensorFlow word2vec with model loading
"""Multi-threaded word2vec mini-batched skip-gram model.
Trains the model described in:
(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
ICLR 2013.
http://arxiv.org/abs/1301.3781
This model does traditional minibatching.
The key ops used are:
* placeholder for feeding in tensors for each example.
@khellan
khellan / word2vec_optimized.py
Last active June 22, 2018 14:30
A version of the optimized word2vec that doesn't require access to the training data when restoring the saved model. Run python tensorflow/tensorflow/models/embedding/word2vec_optimized.py --save_path=/Users/knut/data/wiki/model --embedding_size=500 --use --interactive to test.
# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@khellan
khellan / settings.py
Last active May 31, 2016 20:15
Frontera scrapy fetch error
2016-05-31 21:08:31 [scrapy] INFO: Scrapy 1.1.0 started (bot: cb_crawl)
2016-05-31 21:08:31 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'cb_crawl.spiders', 'DOWNLOAD_TIMEOUT': 60, 'ROBOTSTXT_OBEY': True, 'DEPTH_LIMIT': 10, 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'CONCURRENT_REQUESTS': 256, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['cb_crawl.spiders'], 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'cb_crawl', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'USER_AGENT': 'cb crawl (+http://www.companybooknetworking.com)', 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'REDIRECT_ENABLED': False, 'AUTOTHROTTLE_ENABLED': True, 'DOWNLOAD_DELAY': 0.25}
2016-05-31 21:08:31 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.throttle.AutoThrottle']
2016-05-31 21:08:31 [scrapy] INFO: Enabled downloader middlewares
ackage no.companybook.extraction.tables;
import org.junit.Test;
import java.util.HashSet;
import java.util.Set;
import static org.junit.Assert.*;
public class PersonTest {
Satya Nadella
B Turner
Lisa Brummel
Rupert Bader
Janet Kennedy
Jordan Levin
Horacio Rrez
Christophe Capossela
Angela Jones
David Aucsmith