Skip to content

Instantly share code, notes, and snippets.

View aplz's full-sized avatar
🍉

Anja Pilz aplz

🍉
View GitHub Profile
@aplz
aplz / test_parameterized_exception.py
Last active August 5, 2021 09:48
assert that some input results in an exception being raised
import pytest
from typing import List
def method_under_test(values: List[int]) -> int:
"""dummy method"""
if 0 in values:
raise(ValueError("Zeros are not supported"))
return sum(values)
@pytest.mark.parametrize(
@aplz
aplz / numpy_vector_to_dlib_sparse_vector.py
Last active November 26, 2020 08:21
Create a sparse vector for dlib from a numpy array
"""How to create a sparse vector for dlib from a numpy array.
dlib: http://dlib.net/
"""
import dlib
import numpy
vector = [0,2,0]
sv = dlib.sparse_vector()
for i in numpy.nonzero(vector)[0]:
@aplz
aplz / analyze_ngram.js
Created July 10, 2020 12:04
elasticsearch analyze API with customized ngram analyzer
GET _analyze
{
"text": "Quick fox",
"tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4,
"token_chars": [
"letter",
"digit"
tui,label
T001,Organism
T002,Plant
T004,Fungus
T005,Virus
T007,Bacterium
T008,Animal
T010,Vertebrate
T011,Amphibian
T012,Bird
@aplz
aplz / scroll.py
Created October 16, 2019 09:16 — forked from hmldd/scroll.py
Example of Elasticsearch scrolling using Python client
# coding:utf-8
from elasticsearch import Elasticsearch
import json
# Define config
host = "127.0.0.1"
port = 9200
timeout = 1000
index = "index"
@aplz
aplz / disable_spell_checker.txt
Last active April 17, 2019 08:45
locally disable spell checking in pycharm and IDEA
place the line below above the offensive code part
# noinspection SpellCheckingInspection
for IDEA, the analogon is
@SuppressWarnings("SpellCheckingInspection")
@aplz
aplz / es_german_analyzer.py
Last active October 16, 2019 09:17
elasticsearch-py german analyzer / tokenizer
es = Elasticsearch()
tokens = es.indices.analyze(
body={"analyzer": "german",
"text": "Die junge Informatikerin Katie Bouman machte die "
"historische Aufnahme eines schwarzen Lochs "
"möglich."})['tokens']
for token in tokens:
print(token)
@aplz
aplz / bulk_upsert_generator.py
Created March 25, 2019 15:12
elasticsearch - generator for bulk ingestion with upserts
for doc in collection:
yield {
'_id': id, '_type': TYPE,
'_index': index,
# with "doc_as_upsert": True, the document is either updated
# or created if it does not already exist.
'_op_type': 'update', 'doc': doc, "doc_as_upsert": True
}
@aplz
aplz / copy_directory_filtered_by_timestamp_range.sh
Last active October 24, 2018 12:15
filter files in directory by creation date, copy them to a new directory preserving original sub-directory structure
find . -type f -newermt '2018/10/14' -not -newermt '2018/10/21'| cpio -pvdmB <your-directory>
@aplz
aplz / copy_files_by_timestamp_range.sh
Last active October 24, 2018 12:02
copy files by creation timestamp range in shell
for i in `find . -type f -newermt '2018/10/14' -not -newermt '2018/10/21'`; do cp -p $i <your_folder>/; done