How to create and manage index alias using OpenSearch – CVE Data

How to create and manage index alias using OpenSearch – CVE Data

opensearch_cve_alias

Here in this article we will see how we can create an Alias Index and add multiple indices to it which consist of data related to CVE. We will then try to use this alias to search the indexed data from the backing indices.

Test Environment

Fedora 35 workstation
Docker version 20.10.12
docker-compose version 1.29.2

OpenSearch is used as a search and analytics engine tool. It provides the search, analytics and visualisation capabilities along with advanced security, alerting, sql support, automated index management, deep performance analysis and more.

What is Index Alias

An Alias is a virtual index that can be created and used to manage multiple indices. These indices are usually related. Example let’s say you have some json data for every year in a separate file. We can create an index for each of these json files and add them to an alias index. Now, we can query and analyse the indexed data using the alias which is backed by these multiple indices.

Please note, this is in continuation to my previous article on How to extract and index CVE data using OpenSearch.

If you are interested in watching the video. Here is the YouTube video on the same step by step procedure outlined below.

Procedure

Step1: Instantiate Opensearch using a docker-compose file

Here we are using a docker-compose.yml file to setup our Opensearch search and analytic tool. This cluster consist of a single node and also along with Opensearch we are instantiating opensearch-dashboards service which we can used as visualization tool for data in Opensearch. Please note the volume mount named ‘opensearch-data1’ for Opensearch data persistence.

[admin@fedser32 opensearch-docker]$ cat docker-compose.yml
version: '3'
services:
  opensearch-node1:
    image: opensearchproject/opensearch:1.2.4
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1
      - cluster.initial_master_nodes=opensearch-node1
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - $PWD/opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - opensearch-net
  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:1.2.0
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # must be a string with no spaces when specified as an environment variable
    networks:
      - opensearch-net

volumes:
  opensearch-data1:

networks:
  opensearch-net:

Instantiate the Opensearch and Dashboard service.

[admin@fedser32 opensearch-docker]$ docker-compose up -d
Starting opensearch-dashboards ... done
Starting opensearch-node1      ... done

The Opensearch dashboard can now be accessed using the below URL.

URL – http://fedser32.stack.com:5601/ (Please change the FQDN to your server name or localhost)

Step2: Download, Extract and Index multiple JSON feed data

Let’s first create a text file with json feed data as shown below.

[admin@fedser32 opensearch-docker]$ cat cvejsonfeed.txt
Name	Feed
cve-2021	https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2021.json.gz
cve-2022	https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2022.json.gz
cve-recent	https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-recent.json.gz
cve-modified	https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-modified.json.gz

Now let’s create a wrapper script – cveDataWrapper.sh that iterates through these json feed data one by one and calls the extractIndexCveData.py python script by passing each indexname and json feed link.

[admin@fedser32 opensearch-docker]$ cat cveDataWrapper.sh 
#!/bin/bash

for eachItem in `cat cvejsonfeed.txt | grep -v "Name"| awk -F"\t" '{print $1}'`; do
	indexname=$eachItem
	jsonfeed=`cat cvejsonfeed.txt | grep $eachItem | awk -F "\t" '{print $2}'`
	#echo "indexname: $indexname"
	#echo "jsonfeed : $jsonfeed"
	echo "python extractCVEData.py $indexname $jsonfeed"
	python extractCVEData.py $indexname $jsonfeed
done

Here is the updated version of the python script which takes indexname and json feed link as input.

[admin@fedser32 opensearch-docker]$ cat extractCVEData.py
import json
import gzip
import requests
import sys

# Index variables
indexname = sys.argv[1]

# CVE feed
cvefeed = sys.argv[2]

# Opensearch environment variables
baseurl = "https://fedser32.stack.com:9200"
username = "admin"
password = "admin"
auth = (username, password)
sslcheck = False

def extractCVEData(cvefeed):
    gzfile = requests.get(cvefeed).content
    jsondata = gzip.decompress(gzfile)
    data = json.loads(jsondata)
    return data

def countCVE(cvedata):
    data = json.loads(cvedata)
    count = len(data['CVE_Items'])
    return count

def createIndex(baseurl, auth, sslcheck, indexname):
    print("Create Index....................................")
    createIndexRes = requests.put(baseurl+"/"+ indexname, auth=auth, verify=sslcheck)
    print(createIndexRes.status_code)

def addData(baseurl, auth, sslcheck, indexname, document, docID):
    print("Add Data........................................")
    headers = {'Content-type': 'application/json'}
    data = document
    addDataRes = requests.put(baseurl+"/"+ indexname + "/_doc/"+ str(docID), json=data, auth=auth, verify=sslcheck, headers=headers)
    print(addDataRes.status_code)

print(indexname)
print(cvefeed)
cvedata = extractCVEData(cvefeed)
cvecount = len(cvedata['CVE_Items'])
print(cvecount)
createIndex(baseurl, auth, sslcheck, indexname)
for item in range(cvecount):
    cveitem = cvedata['CVE_Items'][item]
    #print(cveitem)
    #print(item)
    addData(baseurl, auth, sslcheck, indexname, cveitem, item)

Now, we can run the bash wrapper script to create index for each of these json data feeds.

[admin@fedser32 opensearch-docker]$ ./cveDataWrapper.sh

Now we can create the above index patterns and see the indexed data as shown in below screenshot.

Step3: Create an index alias

Here in this step we are going to create an index alias for the following indexes (cve-2021, cve-2022, cve-recent, cve-modified) from OpenSearch Dashboard – Dev Tools as shown below.

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "cve-2021",
        "alias": "cvejsondata"
      }
    },
    {
      "add": {
        "index": "cve-2022",
        "alias": "cvejsondata"
      }
    },
    {
      "add": {
        "index": "cve-recent",
        "alias": "cvejsondata"
      }
    },
    {
      "add": {
        "index": "cve-modified",
        "alias": "cvejsondata"
      }
    }
    
  ]
}

or

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "cve-*",
        "alias": "cvejsondata"
      }
    }
  ]
}

Step4: Validate the alias

We can validate the alias and its corresponding backend indices to which it is mapped using the below API endpoint query as shown below.

GET _alias/cvejsondata

Step5: Search Alias for indexed data

We can use the below queries from the OpenSearch Dashboard – Dev Tools to search for the indexed data from backing indices.

Let’s now search for a CVE item from year 2022.

POST cvejsondata/_search
{
  "query": {
    "match_phrase": {
      "cve.CVE_data_meta.ID": "CVE-2022-0122"
    }
  }
}

Now let’s search for a CVE item from year 2021.

POST cvejsondata/_search
{
  "query": {
    "match_phrase": {
      "cve.CVE_data_meta.ID": "CVE-2021-0084"
    }
  }
}

We can club both of these query into one and query from Dev Tools as shown below.

POST cvejsondata/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "cve.CVE_data_meta.ID": "CVE-2021-0084"
          }
        },
        {
          "match_phrase": {
            "cve.CVE_data_meta.ID": "CVE-2022-0122"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Hope you enjoyed reading this article. Thank you..