Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeffvestal/b8d63d0f11961e8bd59078d0ae68e903 to your computer and use it in GitHub Desktop.
Save jeffvestal/b8d63d0f11961e8bd59078d0ae68e903 to your computer and use it in GitHub Desktop.

Click Here for Example Jupyter Notebooks

Short Link to this gist - ela.st/operationalize-nlp

NER - Named Entity Recognition

NER models can be used two ways in elasticsearch:

  1. On ingest, new documents can be run through an ingest pipeline, which sends one of the fields through the model, and the resulting entity fields are indexed along with the original document. This is useful later when you want to be able to quickly filter on an entity, or run aggregations, etc.
  2. Strings can be submitted directly to a model using the _infer endpoint. The string will be processed by the model and the response message will include any identified entities. Same output as #1, but this is done adhoc, and the results are not stored.

To store the model in elasticsearch:

  1. Select a model from huggingface
  2. Setup Eland
  3. Use Eland to load the model from HF into ES
  4. Start (deploy) the model. Machine Learning -> Model Management in Kibana

If you are not storing the entities with the original docs all you will need to do then it call _infer endpoint with the model_id and the string to process. This can be useful with you simply need to retrieve entities as part of a new query.

If you wish to store the NER entities output with the original document, configuring ingest pipelines to do this on ingest is the easiest route.

Generating NER Entities for documents on ingest

  1. Configure an ingest pipeline to use an inference processor. Configure that processor to use the embedding model
  2. Ingest new documents specifying the pipeline as part of the index request or setting index.default_pipeline

Vector Search with approximate kNN

At a high level the steps to generate vectors are similar to NER above. There are special tuning and memory requirement to keep in mind (see below):

  1. Select a model from huggingface - We recommend either testing a couple models to see which one gives the best recall and precision for your particular data or retrain a model with your specific data
  2. Setup Eland
  3. Use Eland to load the model from HF into ES
  4. Start (deploy) the model

Generating and Storing Vectors

Vectors need to be stored in elasticsearch. This is most often done by storing the vector in the same source document the vector was created from for ease of returning the "human" text as part of the vector query response (you specify which fields to return just like any other _search). You can do this with an ingest pipeline as you index new documents.

  1. Configure a dense_vector field in your mapping to store the vectors
  2. Configure an ingest pipeline to use an inference processor. Configure that processor to use the embedding model
  3. Ingest new documents specifying the pipeline as part of the index request or setting index.default_pipeline

Searching with Vectors

Vector search can use exact search with a script processor, or with approximate K nearest neighbor, _knn To search with kNN, you need to generate a vector for the search request string, then use that vector as input for _knn

  1. Call the _infer endpoint, specifying the embedding model used to generate the vectors for the stored documents (products). The response message will include a new dense_vector
  2. Use the vector from #1 as the query_vector value in the knn section of the _search request. See the doc below for other required configurations (k, num_candidates, field)

There are several considerations when tuning kNN/Vector search.

One very important consideration, which differs from other types of search in elasticsearch, is the specific requirements around page cache / off-heap memory. When all the vectors do not fit into page cache, kNN search will slow down due to having to read the HNSW graphs (dense vector data strucure) from disk.

Compared to other types of search, approximate kNN search has specific resource requirements. In particular, all vector data must fit in the node’s page cache for it to be efficient.

We recommend using a performance tool like Elastic's Rally, to test scaling up kNN search. There is a lose formula in the docs which will help you ballpark the amount of off heap memory required. Starting in 8.6, we now support 8-bit vectors (blog).

There are several ways to troubleshoot slow kNN performance - one of them is running hot_threads api during a slow query to see if elasticsearch is reading from disk. You will see a high cpu% but most of the % in other. Looking for high major page fault counts is another way.

There's a lot more to explore and discuss once you get hands on so feel free to reach out if you have questions [elastic.co](https://www.elastic.co/contact#questions) [Discuss Forums](https://discuss.elastic.co/) [Community Slack](https://ela.st/slack)

Last Updated during elastic stack version 8.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment