Home DevOpsELK stack How To Install Logstash on Ubuntu 18.04 and Debian 9

How To Install Logstash on Ubuntu 18.04 and Debian 9

by schkn

This tutorial covers all the steps necessary to install Logstash on Ubuntu 18.4 and Debian 9.

Logstash is used as a data processing pipeline that aims at simplifying log ingestion, parsing, filtering and redirecting.

Why do we use Logstash?

We use Logstash because Logstash provides a set of plugins that can easily be bound to various targets in order to gather logs from them. Moreover, Logstash provides a very expressive template language, that makes it very easy for developers to manipulate, truncate or transform data streams.

Logstash is part of the ELK stack : Elasticsearch – Logstash – Kibana but tools can be used independently.

With the recent release of the ELK stack v7.x, installation guides need to be updated for recent distributions such as Ubuntu 18.04 and Debian 9.

If you are reading this tutorial, it is probably because you chose to bring Logstash into your infrastructure. Logstash is a powerful tool, but you have to install and configure it properly.

Here are the steps to install Logstash on Ubuntu and Debian:

1 – Install the latest version of Java

Logstash, as every single tool of the ELK stack, needs Java to run properly.

In order to check whether you have Java or not, run the following command:

$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

If you don’t have Java on your computer, you should have the following output.

Installing Java for Logstash

You can install it by running this command.

$ sudo apt-get install default-jre

Make sure that you now have Java installed via the first command that we run.

2 – Add the GPG key to install signed packages

In order to make sure that you are getting official versions of Logstash, you have to download the public signing key and you have to install it.

To do so, run the following commands.

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

On Debian, install the apt-transport-https package.

$ sudo apt-get install apt-transport-https

To conclude, add the Elastic package repository to your own repository list.

$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

3 – Install Logstash with apt

Now that Elastic repositories are added to your repository list, it is time to install the latest version of Logstash on our system.

$ sudo apt-get update
$ sudo apt-get install logstash
apt-get update for Logstash

This directive will :

  • create a logstash user
  • create a logstash group
  • create a dedicated service file for Logstash

From there, running Logstash installation should have created a service on your instance.

To check Logstash service health, run the following command.


On Ubuntu and Debian, equipped with systemd

$ sudo systemctl status logstash

Enable your new service on boot up and start it.

$ sudo systemctl enable logstash
$ sudo systemctl start logstash

Having your service running is just fine, but you can double check it by verifying that Logstash is actually listening on its default port, which is 5044.

Run a simple netstat command, you should have the same output.

$ sudo lsof -i -P -n | grep logstash
java      28872        logstash   56u  IPv6 1160098302      0t0  TCP 
127.0.0.1:47796 > 127.0.0.1:9200 (ESTABLISHED)
java      28872        logstash   61u  IPv4 1160098304      0t0  UDP 127.0.0.1:10514
java      28872        logstash   79u  IPv6 1160098941      0t0  TCP 127.0.0.1:9600 (LISTEN)

As you can tell, Logstash is actively listening for connections on ports 10514 on UDP and 9600 on TCP. It is important to note if you were to forward your logs (from rsyslog to Logstash for example, either by UDP or by TCP).

On Debian and Ubuntu, here’s the content of the service file.

[Unit]
Description=logstash

[Service]
Type=simple
User=logstash
Group=logstash
# Load env vars from /etc/default/ and /etc/sysconfig/ if they exist.
# Prefixing the path with '-' makes it try to load, but if the file doesn't
# exist, it continues onward.
EnvironmentFile=-/etc/default/logstash
EnvironmentFile=-/etc/sysconfig/logstash
ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash"
Restart=always
WorkingDirectory=/
Nice=19
LimitNOFILE=16384

[Install]
WantedBy=multi-user.target

The environment file (located at /etc/default/logstash) contains many of the variables necessary for Logstash to run.

If you wanted to tweak your Logstash installation, for example to change your configuration path, this is the file that you would change.

4 – Personalize Logstash with configuration files

a – Understanding Logstash configuration files

Before personalizing your configuration files, there is a concept that you need to understand about configuration files.

Pipelines configuration files

In Logstash, you define what we called pipelines. A pipeline is composed of :

  • An input: where you take your data from, it can be syslog, Apache or NGINX for example;
  • A filter: a transformation that you would apply to your data ; sometimes you may want to mutate your data, or to remove some fields from the final output.
  • An output: where you are going to send your data, most of the time Elasticsearch, but it can be modified to send a wide variety of different sources.
How Logstash works : inputs, filters, outputs

Those pipelines are defined in configuration files.

In order to define those “pipeline configuration files“, you are going to create “pipeline files” in the /etc/logstash/conf.d directory.

Logstash general configuration file

But with Logstash, you also have standard configuration files, that configures Logstash itself.

This file is located at /etc/logstash/logstash.yml. The general configuration files defines many variables, but most importantly you want to define your log path variable and data path variable.

b – Writing your own pipeline configuration file

For this part, we are going to keep it very simple.

We are going to build a very basic logging pipeline between rsyslog and stdout.

Every single log process via rsyslog will be printed to the shell running Logstash.

As Elastic documentation highlighted it, it can be quite useful to test pipeline configuration files and see immediately what they are giving as an output.

If you are looking for a complete rsyslog to Logstash to Elasticsearch tutorial, here’s a link for it.

To do so, head over to the /etc/logstash/conf.d directory and create a new file named “syslog.conf

$ cd /etc/logstash/conf.d/
$ sudo vi syslog.conf

Paste the following content inside.

input {
  udp {
    host => "127.0.0.1"
    port => 10514
    codec => "json"
    type => "rsyslog"
  }
}

filter { }


output {
  stdout { }
}

As you probably guessed it, Logstash is going to listen to incoming syslog messages on port 10514 and it is going to print it directly in the terminal.

To forward rsyslog messages to the port 10514, head over to your /etc/rsyslog.conf file, and add this line at the top of the file.

*.*         @127.0.0.1:10514

Now in order to debug your configuration, you have to locate the logstash binary on your instance.

To do so, run a simple whereis command.

$ whereis -b logstash
/usr/share/logstash

Now that you have located your logstash binary, shutdown your service and run logstash locally, with the configuration file that you are trying to verify.

$ sudo systemctl stop logstash
$ cd /usr/share/logstash/bin
$ ./logstash -f /etc/logstash/conf.d/syslog.conf

Within a couple of seconds, you should now see the following output on your terminal.

Debugging Logstash pipelines 1

Note : if you have any syntax errors in your pipeline configuration files, you would also be notified.

As a quick example, I removed one bracket from my configuration file. Here’s the output that I got.

Debugging Logstash pipelines

5 – Monitoring Logstash using the Monitoring API

There are multiple ways to monitor a Logstash instance:

  • Using the Monitoring API provided by Logstash itself
  • By configuring the X-Pack tool and sending retrieved data to an Elasticsearch cluster
  • By visualizing data into dedicated panels of Kibana (such as the pipeline viewer for example)

In this chapter, we are going to focus on the Monitoring API, as the other methods require the entire ELK stack installed on your computer to work properly.

a – Gathering general information about Logstash

First, we are going to run a very basic command to get general information about our Logstash instance.

Run the following command on your instance:

$ curl -XGET 'localhost:9600/?pretty'
{
  "host" : "devconnected-ubuntu",
  "version" : "7.2.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "05cfb06f-a652-402c-8da1-f7275fb06312",
  "name" : "devconnected-ubuntu",
  "ephemeral_id" : "871ccf4a-5233-4265-807b-8a305d349745",
  "status" : "green",
  "snapshot" : false,
  "build_date" : "2019-06-20T17:29:17+00:00",
  "build_sha" : "a2b1dbb747289ac122b146f971193cfc9f7a2f97",
  "build_snapshot" : false
}

If you are not running Logstash on the conventional 9600 port, make sure to adjust the previous command.

From the command, you get the hostname, the current version running, as well as the current HTTP address currently used by Logstash.

You also get a status property (green, yellow or red) that has already be explained in the tutorial about setting up an Elasticsearch cluster.

b – Retrieving Node Information

If you are managing an Elasticsearch cluster, there is a high chance that you may want to get detailed information about every single node in your cluster.

For this API, you have three choices:

  • pipelines: in order to get detailed information about pipeline statistics.
  • jvm: to see current JVM statistics for this specific node
  • os: to get information about the OS running your current node.

To retrieve node information on your cluster, issue the following command:

$ curl -XGET 'localhost:9600/_node/pipelines'
{
  "host": "schkn-ubuntu",
  "version": "7.2.0",
  "http_address": "127.0.0.1:9600",
  "id": "05cfb06f-a652-402c-8da1-f7275fb06312",
  "name": "schkn-ubuntu",
  "ephemeral_id": "871ccf4a-5233-4265-807b-8a305d349745",
  "status": "green",
  "snapshot": false,
  "pipelines": {
    "main": {
      "ephemeral_id": "808952db-5d23-4f63-82f8-9a24502e6103",
      "hash": "2f55ef476c3d425f4bd887011f38bbb241991f166c153b283d94483a06f7c550",
      "workers": 2,
      "batch_size": 125,
      "batch_delay": 50,
      "config_reload_automatic": false,
      "config_reload_interval": 3000000000,
      "dead_letter_queue_enabled": false,
      "cluster_uuids": []
    }
  }
}

Here is an example for the OS request:

$ curl -XGET 'localhost:9600/_node/os'
{
  "host": "schkn-ubuntu",
  "version": "7.2.0",
  "http_address": "127.0.0.1:9600",
  "id": "05cfb06f-a652-402c-8da1-f7275fb06312",
  "name": "schkn-ubuntu",
  "ephemeral_id": "871ccf4a-5233-4265-807b-8a305d349745",
  "status": "green",
  "snapshot": false,
  "os": {
    "name": "Linux",
    "arch": "amd64",
    "version": "4.15.0-42-generic",
    "available_processors": 2
  }
}

c – Retrieving Logstash Hot Threads

Hot Threads are threads that are using a large amount of CPU power or that has a execution time that is greater than normal and standard execution times.

To retrieve hot threads, run the following command:

$ curl -XGET 'localhost:9600/_node/hot_threads?pretty'
{
  "host" : "schkn-ubuntu",
  "version" : "7.2.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "05cfb06f-a652-402c-8da1-f7275fb06312",
  "name" : "schkn-ubuntu",
  "ephemeral_id" : "871ccf4a-5233-4265-807b-8a305d349745",
  "status" : "green",
  "snapshot" : false,
  "hot_threads" : {
    "time" : "2019-07-22T18:52:45+00:00",
    "busiest_threads" : 10,
    "threads" : [ {
      "name" : "[main]>worker1",
      "thread_id" : 22,
      "percent_of_cpu_time" : 0.13,
      "state" : "timed_waiting",
      "traces" : [ "[email protected]/jdk.internal.misc.Unsafe.park(Native Method)"...]
    } ]
  }
}

6 – Going Further

Now that you have all the basics about Logstash, it is time for you to build your own pipeline configuration files and to start stashing logs.

I highly recommand that you check Filebeat, which provides a lightweight shipper for logs and that easily be customized in order to build a centralized logging system for your infrastructure.

One of the key features of Filebeat is that it provides a back-pressure sensitive protocol, which essentially means that you are able to regulate the number that you receive.

This is a key point, as you take the risk of overloading your centralized server by pushing too much data to it.

For those who are interested about Filebeat, here’s a video about it.

You may also like

2 comments

How To Install an Elasticsearch Cluster on Ubuntu 18.04 – devconnected July 24, 2019 - 9:37 pm

[…] Grafana The Definitive Guide To InfluxDB In 2019 4 Best Time Series Databases To Watch in… How To Install Logstash on Ubuntu 18.04 and… How To Install InfluxDB 1.7 and 2.0 on… How To Install Grafana on Ubuntu 18.04 How To […]

Reply
Woody July 28, 2021 - 4:57 am

You forgot a “systemctl restart rsyslog”

Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.