Skip to content

Instantly share code, notes, and snippets.

@mrm1001
mrm1001 / es_features.py
Created August 3, 2016 09:00 — forked from konradkonrad/es_features.py
tfidf from elasticsearch
import elasticsearch
from math import log
def tfidf_matrix(es, index, doc_type, fields, size=10, bulk=500, query=dict(match_all=[])):
"""Generate tfidf for `size` documents of `index`/`doc_type`.
All `fields` need to have the mapping "term_vector": "yes".
This is the consuming version (i.e. get everything at once).
:param es: elasticsearch client