This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%matplotlib inline |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
python WikiExtractor.py wiki-20180120-pages-articles-multistream.xml.bz2 --discard_elements gallery,timeline,noinclude --processes $(nproc) --filter_disambig_pages -b 100M | |
https://dumps.wikimedia.org/enwiki/20180720/ | |
sed -e 's/<[^>]*>//g' file.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def profiling(function): | |
import cProfile | |
import pstats | |
from io import StringIO | |
pr = cProfile.Profile() | |
pr.enable() | |
function() | |
pr.disable() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from multiprocessing import Pool | |
num_cores = 8 | |
def foo(ego): | |
en = {ego: 200} | |
return en | |
with Pool(num_cores) as pool: | |
for res in pool.imap_unordered(foo, range(10)): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from gensim.models.keyedvectors import KeyedVectors | |
import faiss | |
import numpy as np | |
from time import time | |
import codecs | |
def build_vector_index(w2v_fpath): | |
w2v = KeyedVectors.load_word2vec_format(w2v_fpath, binary=False, unicode_errors='ignore') | |
w2v.init_sims(replace=True) | |
index = faiss.IndexFlatIP(w2v.vector_size) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%load_ext autoreload | |
%autoreload 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import codecs | |
import json | |
import requests | |
ES_ENDPOINT = "http://localhost:9200" | |
class IndexBuilder(object): | |
def __init__(self): | |
self._index = "wsd" | |
self._dtype = "sentence" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
master=ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com | |
ssh -i ~/.ssh/ireland.pem -N -L 8088:$master:8088 hadoop@$master & | |
ssh -i ~/.ssh/ireland.pem -N -L 20888:$master:20888 hadoop@$master & | |
ssh -i ~/.ssh/ireland.pem -N -L 19888:$master:19888 hadoop@$master & |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo mkfs -t ext4 /dev/xvdcb | |
sudo mkdir /mnt2 | |
sudo mount -t ext4 /dev/xvdcb /mnt2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import glob | |
import urllib3 | |
import argparse | |
import requests | |
import mimetypes | |
from PIL import Image | |
from io import BytesIO | |
import re | |
from os.path import splitext, join |
NewerOlder