Skip to content

Instantly share code, notes, and snippets.


Knut O. Hellan khellan

View GitHub Profile
khellan /
Last active Mar 30, 2020
Sentencepiece 0.1.85 for Python 3.8 on OSX/Mac

Download the file and install it:

pipenv install <path to local wheel>

There you go.

khellan /
Created Sep 21, 2018
Batchwise deletion of malformed HBase row keys. It will not stop when done so it needs monitoring.
import happybase
connection = happybase.Connection(HBASE_MASTER_IP)
table = connection.table(TABLE_NAME)
while True:
batch = table.batch()
for key, _ in table.scan(columns=[COLUMN_NAMES], filter="RowFilter(=, 'regexstring:.*\x09.*')", limit=10000):
View firstnames.txt
Abigail - Nabby, Abby, Gail
Abraham - Abe, Bram
Adelaida - Ida, Idly
Alan - Al
Alastair - Al, Alex
Albert - Al, Bert
Alexander - Alex, Lex, Xander, Sander, Sandy
Alexandra - Alex, Ali, Lexie, Sandy
Alfred - Al, Alf, Alfie, Fred, Fredo
Alonzo - Lonnie
View names.input
Satya Nadella
B Turner
Lisa Brummel
Rupert Bader
Janet Kennedy
Jordan Levin
Horacio Rrez
Christophe Capossela
Angela Jones
David Aucsmith
View gist:6d34eacb25cb3a30eb3e7568ff9d9e61
ackage no.companybook.extraction.tables;
import org.junit.Test;
import java.util.HashSet;
import java.util.Set;
import static org.junit.Assert.*;
public class PersonTest {
khellan /
Last active May 31, 2016
Frontera scrapy fetch error
2016-05-31 21:08:31 [scrapy] INFO: Scrapy 1.1.0 started (bot: cb_crawl)
2016-05-31 21:08:31 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'cb_crawl.spiders', 'DOWNLOAD_TIMEOUT': 60, 'ROBOTSTXT_OBEY': True, 'DEPTH_LIMIT': 10, 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'CONCURRENT_REQUESTS': 256, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['cb_crawl.spiders'], 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'cb_crawl', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'USER_AGENT': 'cb crawl (+', 'SCHEDULER': '', 'REDIRECT_ENABLED': False, 'AUTOTHROTTLE_ENABLED': True, 'DOWNLOAD_DELAY': 0.25}
2016-05-31 21:08:31 [scrapy] INFO: Enabled extensions:
2016-05-31 21:08:31 [scrapy] INFO: Enabled downloader middlewares
khellan /
Last active Jun 22, 2018
A version of the optimized word2vec that doesn't require access to the training data when restoring the saved model. Run python tensorflow/tensorflow/models/embedding/ --save_path=/Users/knut/data/wiki/model --embedding_size=500 --use --interactive to test.
# Copyright 2015 Google Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
khellan /
Created Nov 30, 2015
TensorFlow word2vec with model loading
"""Multi-threaded word2vec mini-batched skip-gram model.
Trains the model described in:
(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
ICLR 2013.
This model does traditional minibatching.
The key ops used are:
* placeholder for feeding in tensors for each example.
khellan / JRuby 1.6.7 double resume
Created Jun 7, 2012
Double resume in JRuby. Note that the result in JRuby varies so it seems to be time sensitive.
View JRuby 1.6.7 double resume
ruby -v
jruby 1.6.7 (ruby-1.9.2-p312) (2012-02-22 3e82bc8) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_01) [linux-amd64-java]
ruby test/double_resume.rb
Loaded suite test/double_resume
Finished in 0.157000 seconds.
1) Error:
khellan / gobbler.erl
Created May 15, 2012
Stepwise introduction to a distributed erlang message loop
View gobbler.erl
-export([code_change/3, handle_call/3, handle_cast/2, handle_info/2]).
-export([init/1, start_link/0, terminate/2]).
-export([count/0, increment/0, stop/0]).
count() -> gen_server:call(?MODULE, count).
You can’t perform that action at this time.