How to implement Semantic based search using OpenSearch

Here in this article we will try to implement semantic based search using OpenSearch leveraging pretrained sentence transformer models available in OpenSearch.

Table of Contents

Test Environment

Fedora Server 41
Docker
Docker Compose
OpenSearch

What is Vector Search

Vector search is an advanced search technique that finds relevant results by comparing the vector representations (numerical embeddings) of data rather than relying on exact keyword matches. In vector search, both queries and data (such as text, images, code, etc.) are converted into high-dimensional vectors using machine learning models (often neural networks).

Examples:

Document Retrieval: Finding articles or answers that are conceptually similar to a query, even with different wording.
Image Search: Finding visually similar images.

What is Semantic Search

Semantic search is an advanced search technique that focuses on understanding the meaning, context, and intent behind a user’s query, rather than simply matching keywords. Unlike traditional keyword-based (lexical) search, which looks for exact word matches, semantic search analyzes the relationships between words and concepts to provide more relevant and accurate results.

Examples:

Searching for “How does authentication work?” might return relevant code or documentation about login, OAuth, or user verification, even if the word “authentication” is not explicitly present.

Powering Semantic or Similarity Search using Vector Search

Vector search and semantic search are related techniques used to improve search relevance, but they differ in how they approach the search process. Vector search focuses on finding similar items based on numerical representations (embeddings) of data, while semantic search understands the meaning and context of user queries to deliver more relevant results.

Procedure

Step1: Update ML-related cluster settings

Here in this demo we will be using OpenSearch-provided machine learning (ML) model and a cluster with no dedicated ML nodes. For this we need to update the ML related cluster settings as shown below.

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.only_run_on_ml_node": "false",
    "plugins.ml_commons.model_access_control_enabled": "true",
    "plugins.ml_commons.native_memory_threshold": "99"
  }
}

{"acknowledged":true,"persistent":{"plugins":{"ml_commons":{"only_run_on_ml_node":"false","model_access_control_enabled":"true","native_memory_threshold":"99"}}},"transient":{}}

Step2: Register a Model

We will use the DistilBERT model from Hugging Face. DistilBERT is a smaller, faster, cheaper, and lighter version of the BERT model developed by Hugging Face. It is a transformer-based machine learning model designed for natural language processing (NLP) tasks such as text classification, question answering, and named entity recognition.

Here is the model that we will use for this demo. For more details on model please check OpenSearch-provided pretrained models.

Model name: huggingface/sentence-transformers/msmarco-distilbert-base-tas-b
Model version: 1.0.3
Dimensions: 768

Now let’s try to register the model in Opensearch as shown below.

POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
  "version": "1.0.3",
  "model_format": "TORCH_SCRIPT"
}

{"task_id":"ZN4YiJcBjAUluTLw7C2p","status":"CREATED"}

Registering a model is an asynchronous task. OpenSearch sends back a task ID for this task. We can check the status of the task by using the Tasks API. We need to wait until the task status is changed to “COMPLETED”.

OpenSearch saves the registered model in the model index.

GET /_plugins/_ml/tasks/ZN4YiJcBjAUluTLw7C2p

{
  "model_id": "aN4ZiJcBjAUluTLwAC35",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "gU2U6uZtS9OBzMNf0Nd3AA"
  ],
  "create_time": 1750335023915,
  "last_update_time": 1750335161066,
  "is_async": true
}

Search for the newly created model by providing its ID in the request.

GET /_plugins/_ml/models/aN4ZiJcBjAUluTLwAC35

{
  "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
  "model_group_id": "Y94YiJcBjAUluTLw6i38",
  "algorithm": "TEXT_EMBEDDING",
  "model_version": "1",
  "model_format": "TORCH_SCRIPT",
  "model_state": "REGISTERED",
  "model_content_size_in_bytes": 266357253,
  "model_content_hash_value": "2fcc51bd52df9bd55f0d46007b80663dc6014687a321d23c00508a08d9c86d86",
  "model_config": {
    "model_type": "distilbert",
    "embedding_dimension": 768,
    "framework_type": "SENTENCE_TRANSFORMERS",
    "all_config": """{"_name_or_path": "sentence-transformers/msmarco-distilbert-base-tas-b", "activation": "gelu", "architectures": ["DistilBertModel"], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "torch_dtype": "float32", "transformers_version": "4.49.0", "vocab_size": 30522}""",
    "pooling_mode": "CLS"
  },
  "created_time": 1750335029271,
  "last_updated_time": 1750335161045,
  "last_registered_time": 1750335161043,
  "total_chunks": 27,
  "is_hidden": false
}

Step3: Deploy the model

Now we will try to deploy the model as shown below. Deploying a model creates a model instance and caches the model in memory.

POST _plugins/_ml/models/aN4ZiJcBjAUluTLwAC35/_deploy

{
  "task_id": "h94ciJcBjAUluTLwei1r",
  "task_type": "DEPLOY_MODEL",
  "status": "CREATED"
}

Let’s check the status of our model deployment as shown below. Once the Model deployment is completed successfully, the task api response will provide us with the model_id as shown below.

GET /_plugins/_ml/tasks/h94ciJcBjAUluTLwei1r

{
  "model_id": "aN4ZiJcBjAUluTLwAC35",
  "task_type": "DEPLOY_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "3C8onhegTyG-jHvXPZIIyw",
    "gU2U6uZtS9OBzMNf0Nd3AA"
  ],
  "create_time": 1750335257191,
  "last_update_time": 1750335369172,
  "is_async": true
}

Once the model has been deployed we can retrieve the model details and model profile statistics.

GET /_plugins/_ml/models/aN4ZiJcBjAUluTLwAC35

{
  "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
  "model_group_id": "Y94YiJcBjAUluTLw6i38",
  "algorithm": "TEXT_EMBEDDING",
  "model_version": "1",
  "model_format": "TORCH_SCRIPT",
  "model_state": "DEPLOYED",
  "model_content_size_in_bytes": 266357253,
  "model_content_hash_value": "2fcc51bd52df9bd55f0d46007b80663dc6014687a321d23c00508a08d9c86d86",
  "model_config": {
    "model_type": "distilbert",
    "embedding_dimension": 768,
    "framework_type": "SENTENCE_TRANSFORMERS",
    "all_config": """{"_name_or_path": "sentence-transformers/msmarco-distilbert-base-tas-b", "activation": "gelu", "architectures": ["DistilBertModel"], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "torch_dtype": "float32", "transformers_version": "4.49.0", "vocab_size": 30522}""",
    "pooling_mode": "CLS"
  },
  "created_time": 1750335029271,
  "last_updated_time": 1750335369173,
  "last_registered_time": 1750335161043,
  "last_deployed_time": 1750335369171,
  "auto_redeploy_retry_times": 0,
  "total_chunks": 27,
  "planning_worker_node_count": 2,
  "current_worker_node_count": 2,
  "planning_worker_nodes": [
    "gU2U6uZtS9OBzMNf0Nd3AA",
    "3C8onhegTyG-jHvXPZIIyw"
  ],
  "deploy_to_all_nodes": true,
  "is_hidden": false
}

GET /_plugins/_ml/profile/models

{
  "nodes": {
    "3C8onhegTyG-jHvXPZIIyw": {
      "models": {
        "aN4ZiJcBjAUluTLwAC35": {
          "model_state": "DEPLOYED",
          "predictor": "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingDenseModel@3f87f344",
          "target_worker_nodes": [
            "3C8onhegTyG-jHvXPZIIyw",
            "gU2U6uZtS9OBzMNf0Nd3AA"
          ],
          "worker_nodes": [
            "3C8onhegTyG-jHvXPZIIyw",
            "gU2U6uZtS9OBzMNf0Nd3AA"
          ],
          "memory_size_estimation_cpu": 319628703,
          "memory_size_estimation_gpu": 319628703
        }
      }
    },
    "gU2U6uZtS9OBzMNf0Nd3AA": {
      "models": {
        "aN4ZiJcBjAUluTLwAC35": {
          "model_state": "DEPLOYED",
          "predictor": "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingDenseModel@61d70acc",
          "target_worker_nodes": [
            "3C8onhegTyG-jHvXPZIIyw",
            "gU2U6uZtS9OBzMNf0Nd3AA"
          ],
          "worker_nodes": [
            "3C8onhegTyG-jHvXPZIIyw",
            "gU2U6uZtS9OBzMNf0Nd3AA"
          ],
          "memory_size_estimation_cpu": 319628703,
          "memory_size_estimation_gpu": 319628703
        }
      }
    }
  }
}

Step4: Create Ingest Pipeline

Here we are going to create an ingestion pipeline which will transform document fields before documents are ingested into an index. The processor that we are using here is “text_embedding” processor, that creates vector embeddings from text.

We will need the model_id of the model that we deployed in the previous section and a field_map, which specifies the name of the field from which to take the text (text) and the name of the field in which to record embeddings (passage_embedding):

PUT /_ingest/pipeline/nlp-ingest-pipeline
{
  "description": "An NLP ingest pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "aN4ZiJcBjAUluTLwAC35",
        "field_map": {
          "text": "passage_embedding"
        }
      }
    }
  ]
}

We can get the ingest pipeline details as shown below.

GET /_ingest/pipeline

Step5: Create a Vector Index

Now we will create a vector index with a field named text, which contains an image description, and a knn_vector field named passage_embedding, which contains the vector embedding of the text. Additionally, we set the default ingest pipeline to the nlp-ingest-pipeline that was created in our previous step.

So when a document is inserted, it goes through the ingestion pipeline which generates vector embeddings from text and places under “passage_embedding” field.

PUT /my-nlp-index
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "nlp-ingest-pipeline"
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "space_type": "l2"
      },
      "text": {
        "type": "text"
      }
    }
  }
}

We can retrieve the Mapping and Settings details for NLP index as shown below.

GET /my-nlp-index/_settings
GET /my-nlp-index/_mappings

Step6: Ingest documents and Retrieve documents

Now that we have our ingestion pipeline and vector index created, we are ready to ingest documents.

PUT /my-nlp-index/_doc/1
{
  "text": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .",
  "id": "4319130149.jpg"
}

PUT /my-nlp-index/_doc/2
{
  "text": "A wild animal races across an uncut field with a minimal amount of trees .",
  "id": "1775029934.jpg"
}

PUT /my-nlp-index/_doc/3
{
  "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .",
  "id": "2664027527.jpg"
}

PUT /my-nlp-index/_doc/4
{
  "text": "A man who is riding a wild horse in the rodeo is very near to falling off .",
  "id": "4427058951.jpg"
}

PUT /my-nlp-index/_doc/5
{
  "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .",
  "id": "2691147709.jpg"
}

When the documents are ingested into the index, the text_embedding processor creates an additional field that contains vector embeddings and adds that field to the document. To see an example document that is indexed, search for document 1.

The response includes the document _source containing the original text and id fields and the added passage_embedding field.

GET /my-nlp-index/_doc/1

{
  "_index": "my-nlp-index",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "passage_embedding": [
      0.044916227,
      -0.3410559,
      0.036822222,
      -0.1413904,
      -0.17355084,
      -0.05015643,
      -0.120562606,
      0.005346317,
...

Step7: Search using Semantic Search

To search using a semantic search, use a neural query and provide the model ID of the model we set up earlier so that vector embeddings for the query text are generated with the model used at ingestion time.

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "wild west",
        "model_id": "aN4ZiJcBjAUluTLwAC35",
        "k": 5
      }
    }
  }
}

The response contains all the five documents and also the document order is improved because semantic search considers semantic meaning.

Hope you enjoyed reading this article. Thank you..

How to implement Semantic based search using OpenSearch