jeffvestal/Quick Guide for setting up NER and Vector Models (High Level).md

## Quick Guide for setting up NER and Vector Models (High Level).md

      
    Raw
  

              Quick Guide for setting up NER and Vector Models (High Level).md
            
          
    Click Here for Example Jupyter Notebooks
Short Link to this gist - ela.st/operationalize-nlp
NER - Named Entity Recognition

NER models can be used two ways in elasticsearch:

On ingest, new documents can be run through an ingest pipeline, which sends one of the fields through the model, and the resulting entity fields are indexed along with the original document. This is useful later when you want to be able to quickly filter on an entity, or run aggregations, etc.
Strings can be submitted directly to a model using the _infer endpoint. The string will be processed by the model and the response message will include any identified entities. Same output as #1, but this is done adhoc, and the results are not stored.

To store the model in elasticsearch:


Select a model from huggingface
Setup Eland
Use Eland to load the model from HF into ES
Start (deploy) the model. Machine Learning -> Model Management in Kibana

If you are not storing the entities with the original docs all you will need to do then it call _infer endpoint with the model_id and the string to process. This can be useful with you simply need to retrieve entities as part of a new query.

How to Deploy NER (Includes Eland Info) - Docs
Set the number of allocations and the number of threads/allocation for model performance
How to Deploy NER - Blog
Eland Github Repo
_infer endpoint - Docs
Compatible third party NLP models

If you wish to store the NER entities output with the original document, configuring ingest pipelines to do this on ingest is the easiest route.
Generating NER Entities for documents on ingest


Configure an ingest pipeline to use an inference processor. Configure that processor to use the embedding model
Ingest new documents specifying the pipeline as part of the index request or setting index.default_pipeline


Ingest Pipelines - Docs
Add a pipeline to an indexing request - Docs

Vector Search with approximate kNN

At a high level the steps to generate vectors are similar to NER above. There are special tuning and memory requirement to keep in mind (see below):

Select a model from huggingface - We recommend either testing a couple models to see which one gives the best recall and precision for your particular data or retrain a model with your specific data
Setup Eland
Use Eland to load the model from HF into ES
Start (deploy) the model


Compatible third party NLP models
Eland Github Repo

Generating and Storing Vectors

Vectors need to be stored in elasticsearch. This is most often done by storing the vector in the same source document the vector was created from for ease of returning the "human" text as part of the vector query response (you specify which fields to return just like any other _search). You can do this with an ingest pipeline as you index new documents.

Configure a dense_vector field in your mapping to store the vectors
Configure an ingest pipeline to use an inference processor. Configure that processor to use the embedding model
Ingest new documents specifying the pipeline as part of the index request or setting index.default_pipeline


Dense vector field type
Add NLP inference to ingest pipelines - Docs
Ingest Pipelines - Docs
Add a pipeline to an indexing request - Docs
Ranking evaluation API - Precision, Recall, DCG evaluation

Searching with Vectors

Vector search can use exact search with a script processor, or with approximate K nearest neighbor, _knn
To search with kNN, you need to generate a vector for the search request string, then use that vector as input for _knn

Call the _infer endpoint, specifying the embedding model used to generate the vectors for the stored documents (products). The response message will include a new dense_vector
Use the vector from #1 as the query_vector value in the knn section of the _search request. See the doc below for other required configurations (k, num_candidates, field)


nearest neighbor (kNN) search - docs
Tune approximate kNN for speed or accuracy - Docs
Tuning guide for kNN search

There are several considerations when tuning kNN/Vector search.

One very important consideration, which differs from other types of search in elasticsearch, is the specific requirements around page cache / off-heap memory. When all the vectors do not fit into page cache, kNN search will slow down due to having to read the HNSW graphs (dense vector data strucure) from disk.

Compared to other types of search, approximate kNN search has specific resource requirements. In particular, all vector data must fit in the node’s page cache for it to be efficient.

We recommend using a performance tool like Elastic's Rally, to test scaling up kNN search.
There is a lose formula in the docs which will help you ballpark the amount of off heap memory required.
Starting in 8.6, we now support 8-bit vectors (blog).
There are several ways to troubleshoot slow kNN performance - one of them is running hot_threads api during a slow query to see if elasticsearch is reading from disk. You will see a high cpu% but most of the % in other. Looking for high major page fault counts is another way.

There's a lot more to explore and discuss once you get hands on so feel free to reach out if you have questions 
[elastic.co](https://www.elastic.co/contact#questions) 
[Discuss Forums](https://discuss.elastic.co/)
[Community Slack](https://ela.st/slack)
Last Updated during elastic stack version 8.6