How to collect and index nginx log using filebeat and elasticsearch

How to collect and index nginx log using filebeat and elasticsearch


Test Environment

Fedora 32 installed

What is Elasticsearch

Elasticsearch helps in indexing the data read from Logstash. Its a full text search engine. It provides tools to query, access and aggregate the data using the API’s. This tool is based on the Apache Lucene search engine.

What is Kibana

Its used to reads/query data from elasticsearch indices using its API’s. Also we use kibana to visualise and generates graphs and charts for the data that is indexed.

What are Beats

These are lightweight and are installed as agents. THey reads data, parses it and ships it to either elasticsearch or logstash. Metricsbeat, Filebeat and Packetbeat are some of the beats available. ‘libbeat’ is the library which can be used to write custom beat.

Here in this article we will will try to capture the access logs from nginx service using the filebeat service and send it to elasticsearch service for indexing the data. We will be carrying out this activity using filebeat inputs approach available in the tool.

If you are interested in watching the video. Here is the YouTube video on the same step by step procedure shown below.


Step1: Install and Configure Nginx to serve static content

Here in this step we will be installing the nginx service from the fedora repository.

[admin@fedser32 ~]$ sudo dnf install nginx-1:1.20.0-2.fc32.x86_64

The standard installation of nginx comes with a default ‘/etc/nginx/nginx.conf’ which serves static content from ‘/usr/share/nginx/html’ and writes the access and error logs at the following location ‘/var/log/nginx/access.log’ and ‘/var/log/nginx/error.log’.

Once the package is installed, you can enable and start the nginx service and validate it by requesting the default nginx static content page by hitting the URL as below.

[admin@fedser32 ~]$ sudo systemctl enable nginx.service
[admin@fedser32 ~]$ sudo systemctl start nginx.service

URL – http://localhost/

Step2: Install filebeat

Now that we have our source of data i.e nginx service up and running, in this step we will be setting up the filebeat service by installing it from the fedora repositories. Here i am installing the latest available version of filebeat i.e v7.14.1.

[admin@fedser32 ~]$ sudo dnf install filebeat-7.14.1-1.x86_64

Once the filebeat service is installed, you can enable and start the service as shown below.

[admin@fedser32 ~]$ sudo systemctl enable filebeat.service
[admin@fedser32 ~]$ sudo systemctl start filebeat.service

The standard installation of filebeat has its default configuration file at location ‘/etc/filebeat/filebeat.yml’ and the logging location for this service is available at ‘/var/log/filebeat/filebeat’.

Step3: Install elasticsearch

Here we will now install elasticsearch service from the fedora repositories. We will be using elasticsearch to index the nginx log data.

[admin@fedser32 ~]$ sudo dnf install elasticsearch-7.14.1-1.x86_64

Once the elasticsearch service is installed, you can enable and start the service as shown below.

[admin@fedser32 ~]$ sudo systemctl enable elasticsearch.service
[admin@fedser32 ~]$ sudo systemctl start elasticsearch.service

Step4: Install Kibana

Here we will now install kibana service from the fedora repositories. We will be using kibana to visualize the indexed nginx log data.

[admin@fedser32 ~]$ sudo dnf install kibana-7.14.1-1.x86_64

Once the kibana service is installed, you can enable and start the service as shown below.

[admin@fedser32 ~]$ sudo systemctl enable kibana.service
[admin@fedser32 ~]$ sudo systemctl start kibana.service

Step5: Configure filebeat to capture logs from nginx service

Take backup of the existing filebeat.yml file before making any modification to the configuration.

[admin@fedser32 ~]$ cp /etc/filebeat/filebeat.yml /etc/filebeat/filebeat.yml_original

Once the backup is completed, lets update the filebeat.yml with the below content. Here are the details about the configuration file which i am using.

Filebeat inputs section

We are enabling the filebeat inputs section. Filebeat starts a harvester for each file that it finds under the specified paths. ‘harvester_buffer_size’ is the size in bytes of the buffer that each harvester uses when fetching a file. The default is 16384. You can edit this value as per your logs size. Also, you can see that i am setting a field named log_type which will be the root level fields added additionally to the log data captured. ‘exclude_files’ will ignore all the files ending with .gz.

General section

Here we are setting two new fields named bu and env which will be root level fields added globally to all the data captured.

Output section

Here in this section we are setting the elasticsearch host and port to which we will be sending the log data captured by filebeat for indexing. Please note, if you have security enabled on elasticsearch you need to enable the authenticaiton and provide the credentials and also if HTTPs is enabled you need to enable the protocol setting.

[root@fedser32 ~]# cat /etc/filebeat/filebeat.yml
# ============================== Filebeat inputs ===============================


- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
    - /var/log/nginx/access.log
  exclude_files: ['.gz$']
  harvester_buffer_size: 131072

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields_under_root: true
    log_type: access_test

# ================================== General ===================================

fields_under_root: true
  env: testing
  bu: stack

# ================================== Outputs ===================================

  # Array of hosts to connect to.
  hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"
  #ssl.verification_mode: none

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "admin"
  #password: "admin@1234"

Step6: Restart filebeat service

Once the necessary config updates are done we can restart the filebeat service and make sure there are no errors while starting up the service.

[root@fedser32 filebeat]# systemctl restart filebeat.service
[root@fedser32 filebeat]# systemctl status filebeat.service 
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
     Loaded: loaded (/usr/lib/systemd/system/filebeat.service; disabled; vendor preset: disabled)
     Active: active (running) since Tue 2021-09-14 22:58:43 IST; 5s ago
   Main PID: 5663 (filebeat)
      Tasks: 12 (limit: 18885)
     Memory: 135.5M
        CPU: 383ms
     CGroup: /system.slice/filebeat.service
             └─5663 /usr/share/filebeat/bin/filebeat --environment systemd -c /etc/filebeat/filebeat.yml --path.home /usr/share/filebeat --path.c>

Step7: Restart the elasticsearch service

Lets start the elasticsearch service so that the filebeat is able to output data to elasticsearch to be indexed.

[root@fedser32 filebeat]# systemctl start elasticsearch.service 
[root@fedser32 filebeat]# systemctl status elasticsearch.service 
● elasticsearch.service - Elasticsearch
     Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
     Active: active (running) since Tue 2021-09-14 23:02:26 IST; 4s ago
   Main PID: 5792 (java)
      Tasks: 106 (limit: 18885)
     Memory: 8.5G
        CPU: 1min 53.176s
     CGroup: /system.slice/elasticsearch.service
             ├─5792 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10>
             └─5991 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
Sep 14 23:02:02 systemd[1]: Starting Elasticsearch...
Sep 14 23:02:26 systemd[1]: Started Elasticsearch.

Step8: Validate the indexed data in kibana

Once the elasticsearch and filebeat services are up and running. You can try to hit the nginx for static content to get some log data generated in the access logs.

URL – http://localhost

Now we can go to the Kibana portal available at the below url and navigate to Analytics – Discover section in the left navigation menu. You should be able to see the nginx log data indexed under the filebeat-* index pattern as shown below.

URL – http://localhost:5601

Hope you enjoyed reading this article. Thank you..