How to setup log analytics using Opensearch, Data-Prepper and Fluent-bit

Here in this article we will how we can setup log analytics using Opensearch, Data Prepper and Fluentbit tools. We will also be demonstrating the indexing of nginx logs into the Opensearch cluster using fluent=bit and data-prepper.

Table of Contents

Test Environment

Fedora 36 server
Docker Engine

What is Log Ingestion

Log ingestion is a way to transform the unstructured log data into structured log data and ingest into Opensearch. This structured log data allows for improved query and filtering based on the data format when searching logs for an event.

Here are the list of components that we are going to discuss about for using log analytics using Opensearch.

Component	Description
FluentBit	It is used as log collector that collects log data from the application and sends it to Data Prepper
Data Prepper	It transforms the unstructured log data data into a structured format and sends to Opensearch cluster
Opensearch	It is place where the structured log data is indexed for futher search and analytics activities
Opensearch Dashboard	It helps in search and analytics of the indexed data and for data visualization

If you are interested in watching the video. Here is the YouTube video on the same step by step procedure outlined below.

Procedure

Step1: Install and Setup Opensearch

Here in this first step we are going to setup an two node opensearch cluster using a docker-compose.yml file as shown below. We are using the stable version ‘2.3.0’ for the opensearch and opensearch-dashboards services. Also we are persisting the opensearch data using volume mounts opensearch-data1 and opensearch-data2.

File: docker-compose.yml

version: '3'
services:
  opensearch-node1: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
    image: opensearchproject/opensearch:2.3.0 # Specifying the latest available image - modify if you want a specific version
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch-node1 # Name the node that will run in this container
      - discovery.seed_hosts=opensearch-node1,opensearch-node2 # Nodes to look for when discovering the cluster
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2 # Nodes eligibile to serve as cluster manager
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
    ulimits:
      memlock:
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      nofile:
        soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network
  opensearch-node2:
    image: opensearchproject/opensearch:2.3.0 # This should be the same image used for opensearch-node1 to avoid issues
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net
  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:2.3.0 # Make sure the version of opensearch-dashboards matches the version of opensearch installed on other nodes
    container_name: opensearch-dashboards
    ports:
      - 5601:5601 # Map host port 5601 to container port 5601
    expose:
      - "5601" # Expose port 5601 for web access to OpenSearch Dashboards
    environment:
      OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # Define the OpenSearch nodes that OpenSearch Dashboards will query
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:

Once we are reading with out docker compose file we can launch the opensearch and opensearch-dashboard service in detached mode as shown below.

docker-compose up -d

Ensure that you are able to access your opensearch cluster and opensearch dashboard as shown below.

curl -k https://opensearch.stack.com:9200 -u admin:admin
{
  "name" : "opensearch-node1",
  "cluster_name" : "opensearch-cluster",
  "cluster_uuid" : "uJhp7xQfRLSVdoBp114VJQ",
  "version" : {
    "distribution" : "opensearch",
    "number" : "2.3.0",
    "build_type" : "tar",
    "build_hash" : "6f6e84ebc54af31a976f53af36a5c69d474a5140",
    "build_date" : "2022-09-09T00:07:12.137133581Z",
    "build_snapshot" : false,
    "lucene_version" : "9.3.0",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Step2: Install and Configure Data Prepper

Data Prepper is an open source data collector for observability data (trace, logs, metrics) that can filter, enrich, transform, normalize, and aggregate data for downstream analysis and visualization. In this step we are going to setup our Data Prepper using the below docker-compose.yml file as shown below. As you can see we are exposing the port 2021 of data-prepper which is used as an http source for collecting the data.

Also we have mounted log_pipeline.yaml file which provides with the information on the source from where the data is coming and the sink where we are going to send out. We use processor to do data transformation. It is similar to logstash pipeline where in we define the input, filter and output plugin information.

File: docker-compose.yml

version: '3'
services:
  data-prepper:
    container_name: data-prepper
    image: opensearchproject/data-prepper:2.0.1
    volumes:
      - ./log_pipeline.yaml:/usr/share/data-prepper/pipelines/log_pipeline.yaml
    ports:
      - 2021:2021
    networks:
      - opensearch-net
networks:
  opensearch-net:

File: log_pipeline.yaml

log-pipeline:
  source:
    http:
      ssl: false
  processor:
    - grok:
        match:
          log: [ "%{COMMONAPACHELOG}" ]
  sink:
    - opensearch:
        hosts: [ "https://opensearch.stack.com:9200" ]
        insecure: true
        username: admin
        password: admin
        index: nginx_logs

Once we are ready with out docker compose file, we can launch our data-prepper service as shown below.

docker-compose up -d

Ensure that the data-prepper service is up and running as shown below.

curl -k http://opensearch.stack.com:2021/log/ingest

Output:

Status: 405
Description: Method Not Allowed

Step3: Install and Configure Fluentbit

Fluent Bit is a super fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. In this step we are going to setup our log shipping tool fluentbit using the docker-comppose.yml file as shown below.

As you can see we have mounted fluent-bit.conf which provides information related to input from where it can collect the logs and output which provides the data-prepper instance to which we want to send the raw unstructured data for processing.

File: docker-compose.yaml

version: '3'
services:
  fluent-bit:
    container_name: fluent-bit
    image: fluent/fluent-bit
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
      - /var/log/nginx/access.log:/var/log/access.log
    networks:
      - opensearch-net
networks:
  opensearch-net:

File: fluent-bit.conf

[INPUT]
  name                  tail
  refresh_interval      5
  path                  /var/log/access.log
  read_from_head        true

[OUTPUT]
  Name http
  Match *
  Host opensearch.stack.com
  Port 2021
  URI /log/ingest
  Format json

Once we are ready with the docker compose file for fluent-bit, we can launch the service in detached mode as shown below.

docker-compose up

Ensure that you fluent-bit is started able to collect the logs from the configuration as shown below.

docker-compose logs -f

Output:

Attaching to fluent-bit
fluent-bit    | Fluent Bit v2.0.0
fluent-bit    | * Copyright (C) 2015-2022 The Fluent Bit Authors
fluent-bit    | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluent-bit    | * https://fluentbit.io
fluent-bit    | 
fluent-bit    | [2022/10/27 05:37:42] [ info] [fluent bit] version=2.0.0, commit=08a37a74b2, pid=1
fluent-bit    | [2022/10/27 05:37:42] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
fluent-bit    | [2022/10/27 05:37:42] [ info] [cmetrics] version=0.5.3
fluent-bit    | [2022/10/27 05:37:42] [ info] [ctraces ] version=0.2.5
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] initializing
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
fluent-bit    | [2022/10/27 05:37:42] [ info] [output:http:http.0] worker #0 started
fluent-bit    | [2022/10/27 05:37:42] [ info] [sp] stream processor started
fluent-bit    | [2022/10/27 05:37:42] [ info] [output:http:http.0] worker #1 started
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] inotify_fs_add(): inode=3328796 watch_fd=1 name=/var/log/access.log

Step4: Run Nginx

Now that we are ready with opensearch cluster, data-prepper backend service, we need to generated some logs and send it to fluent-bit service for log shipping. We will install nginx as a service on the host from which we want to collect the nginx logs. In my case i am using the same host machine where my backend services are setup.

Also, note that you need to setup the fluent-bit service on the respective host where the nginx service is running to collect the nginx logs. In my case it the same host.

Ensure that your nginx service is up and running.

sudo dnf install nginx
sudo systemctl start nginx.service

Ensure that nginx logs are accessible for the user with which you are logged in.

sudo chmod -R 755 /var/log/nginx/

Step5: Access Nginx Application

We will use the default index.html page to access our nginx instance. Here is the url i am using to access the default nginx page for a running nginx instance.

URL - http://opensearch.stack.com/

Try hitting this page multiple times to log more data into nginx access.log file and tail the logs for fluent-bit service as shown below. Now you should be able to see that fluent-bit is able to send the nginx log data to data-prepper with a SUCCESS – 200 status.

docker-compose logs -f

Output:

Attaching to fluent-bit
fluent-bit    | Fluent Bit v2.0.0
fluent-bit    | * Copyright (C) 2015-2022 The Fluent Bit Authors
fluent-bit    | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluent-bit    | * https://fluentbit.io
fluent-bit    | 
fluent-bit    | [2022/10/27 05:37:42] [ info] [fluent bit] version=2.0.0, commit=08a37a74b2, pid=1
fluent-bit    | [2022/10/27 05:37:42] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
fluent-bit    | [2022/10/27 05:37:42] [ info] [cmetrics] version=0.5.3
fluent-bit    | [2022/10/27 05:37:42] [ info] [ctraces ] version=0.2.5
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] initializing
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
fluent-bit    | [2022/10/27 05:37:42] [ info] [output:http:http.0] worker #0 started
fluent-bit    | [2022/10/27 05:37:42] [ info] [sp] stream processor started
fluent-bit    | [2022/10/27 05:37:42] [ info] [output:http:http.0] worker #1 started
fluent-bit    | [2022/10/27 05:37:42] [ info] [input:tail:tail.0] inotify_fs_add(): inode=3328796 watch_fd=1 name=/var/log/access.log
fluent-bit    | [2022/10/27 05:37:42] [ info] [output:http:http.0] opensearch.stack.com:2021, HTTP status=200
fluent-bit    | 200 OK
fluent-bit    | [2022/10/27 05:37:52] [ info] [output:http:http.0] opensearch.stack.com:2021, HTTP status=200
fluent-bit    | 200 OK
fluent-bit    | [2022/10/27 05:37:56] [ info] [output:http:http.0] opensearch.stack.com:2021, HTTP status=200
fluent-bit    | 200 OK
fluent-bit    | [2022/10/27 05:38:19] [ info] [output:http:http.0] opensearch.stack.com:2021, HTTP status=200
fluent-bit    | 200 OK

You can also tail the logs from the data-prepper service and it is sending data to opensearch cluster for indexing.

docker-compose logs -f

data-prepper    | 2022-10-27T05:38:22,615 [log-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Sending data to OpenSearch
data-prepper    | 2022-10-27T05:57:47,665 [log-pipeline-processor-worker-1-thread-1] INFO  org.opensearch.dataprepper.pipeline.ProcessWorker -  log-pipeline Worker: Processing 1 records from buffer
data-prepper    | 2022-10-27T05:57:47,671 [log-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Sending data to OpenSearch
data-prepper    | 2022-10-27T08:50:08,641 [log-pipeline-processor-worker-1-thread-1] INFO  org.opensearch.dataprepper.pipeline.ProcessWorker -  log-pipeline Worker: Processing 1 records from buffer
data-prepper    | 2022-10-27T08:50:08,678 [log-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Sending data to OpenSearch
data-prepper    | 2022-10-27T08:50:12,609 [log-pipeline-processor-worker-1-thread-1] INFO  org.opensearch.dataprepper.pipeline.ProcessWorker -  log-pipeline Worker: Processing 1 records from buffer
data-prepper    | 2022-10-27T08:50:12,615 [log-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Sending data to OpenSearch
data-prepper    | 2022-10-27T08:50:16,608 [log-pipeline-processor-worker-1-thread-1] INFO  org.opensearch.dataprepper.pipeline.ProcessWorker -  log-pipeline Worker: Processing 1 records from buffer
data-prepper    | 2022-10-27T08:50:16,612 [log-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Sending data to OpenSearch

Finally, head into OpenSearch Dashboards (http://opensearch.stack.com:5601) to view your processed logs. You will need to create an index pattern for the index provided in your pipeline.yaml in order to see them. You can do this by going to Stack Management -> Index Pattterns. Now start typing in the name of the index you sent logs to (in this guide it was nginx_logs), and you should see that the index pattern matches 1 source. Click Create Index Pattern, and you should then be able to go back to the Discover tab to see your processed logs as shown below.

Hope you enjoyed reading this article. Thank you..

How to setup log analytics using Opensearch, Data-Prepper and Fluent-bit