In this post we will setup a 3 Node Elasticsearch Cluster which will be installed on Ubuntu 18.04.
Elasticsearch is a distributed, restful search and analytics engine built on Apache Lucene.
Elasticsearch has become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics etc.

In this tutorial, we will install a 3 node cluster and go through some API examples on creating indexes, ingesting documents, searches etc.
This cluster will consist of 3 data nodes, so with this scenario a master node will be elected and with a 3 node cluster, we would want to avoid a split brain and have quorum of master-eligible nodes.
In a future post, I will be going through how to setup a Elasticsearch Cluster with Dedicated Masters
But before we get to that, let’s cover some basics:
Elasticsearch Basic Concepts
- a Elasticsearch Cluster is made up of a number of nodes;
- Each Node contains Indexes, where a Index is a Collection of Documents;
- Master nodes are responsible for Cluster related tasks, creating / deleting indexes, tracking of nodes, allocate shards to nodes;
- Data nodes are responsible for hosting the actual shards that has the indexed data also handles data related operations like CRUD, search, and aggregations;
- Indexes is split into Multiple Shards;
- Shards exists of Primary Shards and Replica Shards;
- A Replica Shard is a Copy of a Primary Shard which is used for HA/Redundancy;
- Shards gets placed on random nodes throughout the cluster;
- A Replica Shard will NEVER be on the same node as the Primary Shard’s associated shard-id.

Note on Master Elections
The minimum number of master eligible nodes that need to join a newly elected master in order for an election is configured via the setting discovery.zen.minimum_master_nodes
. This configuration is extremely important, as it makes each master-eligible node aware of the minimum number of master-eligible nodes that must be visible in order to form a cluster.
Without this setting or incorrect configuration, this might lead to a split brain, where let’s say something went wrong and upon nodes rejoining the cluster, it may form 2 different clusters, which we want to avoid at all costs.
From consulting elasticsearch documentation, to avoid a split brain, this setting should be set to a quorum of master-eligible nodes via the following formula:
(master_eligible_nodes / 2) + 1
# in our case:
(3/2) + 1 = 2
It is recommended to avoid having only two master eligible nodes, since a quorum of two is two. To read more on elasticsearch cluster master election process, have a look at their documentation
We need to set the internal IP addresses of our nodes to either our hosts file or DNS server. To keep it simple, I will add them to my host file. This needs to applied to both nodes:
$ sudo su -
$ cat > /etc/hosts << EOF localhost es-node-1 es-node-2 es-node-3
Now that our host entries are set, we can start with the fun stuff.
Install Elasticsearch
The below instructions need to be applied to both nodes.
Get the Elasticsearch repositories and update your system so that your servers are aware of the newly added Elasticsearch repository:
$ apt update && apt upgrade -y
$ apt install software-properties-common python-software-properties apt-transport-https -y
$ wget -qO - | sudo apt-key add -
$ echo "deb stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
$ apt update
Elasticsearch relies on Java, so install the java development kit:
$ apt install default-jdk -y
Verify that java is installed:
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)
Install Elasticsearch:
$ apt install elasticsearch -y
Once Elasticsearch is installed, repeat these steps on the second node. Once that is done, move on to the configuration section.
Configure Elasticsearch
For nodes to join the same cluster, they should all share the same cluster name.
We also need to specify the the discovery hosts as the masters so that the nodes can be discoverable. Since we are installing a 3 node cluster, all nodes will contribute to a master and data node role.
Feel free to inspect the Elasticsearch cluster configuration, but I will be overwriting the default configuration with the config that I need.
Make sure to apply the configuration on both nodes:
$ cat > /etc/elasticsearch/elasticsearch.yml << EOF es-cluster \${HOSTNAME}
node.master: true true
path.logs: /var/log/elasticsearch /usr/share/elasticsearch/data
bootstrap.memory_lock: true
discovery.zen.minimum_master_nodes: 2 ["es-node-1", "es-node-2"]
Important settings for your elasticsearch cluster is described on their docs:
- Disable swapping
- Increase file descriptors
- Ensure sufficient virtual memory
- Ensure sufficient threads
- JVM DNS cache settings
- Temporary directory not mounted with noexec
Increase the file descriptors on the nodes, as instructed by the documentation:
$ cat > /etc/default/elasticsearch << EOF
Ensure that pages are not swapped out to disk by requesting the JVM to lock the heap in memory by setting LimitMEMLOCK=infinity.
Set the maxiumim file descriptor number for this process: LimitNOFILE and increase the number of threads using LimitNPROC:
$ vim /usr/lib/systemd/system/elasticsearch.service
Increase the limit on the number of open files descriptors to user elasticsearch of 65536 or higher
$ cat > /etc/security/limits.conf << EOF
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
Increase the value of the mmap counts as elasticsearch uses mmapfs directory to store its indices:
$ sysctl -w vm.max_map_count=262144
For a permanent setting, update vm.max_map_count in /etc/sysctl.conf and run:
$ sysctl -p /etc/sysctl.conf
Change the permissions of the elasticsearch data path, so that the elasticsearch user and group has permissions to read and write from the configured path:
$ chown -R elasticsearch:elasticsearch /usr/share/elasticsearch
Make sure that you have applied these steps to all the nodes before continuing.
Start Elasticsearch
Enable Elasticsearch on boot time and start the Elasticsearch service:
$ systemctl enable elasticsearch
$ systemctl start elasticsearch
Verify that Elasticsearch is running:
$ netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::9200 :::* LISTEN 278/java
tcp6 0 0 :::9300 :::* LISTEN 278/java
Using Elasticsearch Restful API
In this section we will get comfortable with using Elasticsearch API, by covering the following examples:
- Cluster Overview;
- How to view Node, Index and Shard information;
- How to Ingest data into Elasticsearch;
- Who to Search data in Elasticsearch;
- How to delete your Index
View Cluster Health
From any node, use a HTTP client such as curl to investigate the current health of the cluster by looking at the cluster API:
$ curl -XGET http://localhost:9200/_cluster/health?pretty
"cluster_name" : "es-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
As you can see the cluster status is Green, which means everything works as expected.
In Elasticsearch you get Green, Yellow and Red statuses. Yellow would essentially mean that one or more replica shards are in a unassigned state. Red status means that some or all primary shards are unassigned which is really bad.
From this output we can also see the number of data nodes, primary shards, unassigned shards etc.
This is a good place to get an overall view of your Elasticsearch cluster’s health.
View the Number of Nodes in your Cluster
By looking at that /_cat/nodes
API we can get information about our nodes that is part of our cluster:
$ curl -XGET http://localhost:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 10 95 0 0.00 0.00 0.00 mdi - es-node-1 11 94 0 0.00 0.00 0.00 mdi - es-node-2 25 96 0 0.07 0.02 0.00 mdi * es-node-3
As you can see, we can see information about our nodes such as the JVM Heap, CPU, Load averages, node role of each node and which node is master.
As we are not running dedicated masters, we can see that node es-node-3 got elected as master.
Create your first Elasticsearch Index
Note that when you create a index, the default primary shards are set to 5 and the default replica shard count is set to 1. You can change the replica shard count after a index has been created, but not the primary shard count as that you will need to set on index creation.
Let’s create a Elasticsearch index named myfirstindex :
$ curl -XPUT http://localhost:9200/myfirstindex
Now that your index has been created, let’s have a look at the /_cat/indices
API to get information about our indices:
$ curl -XGET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size
green open myfirstindex xSX9nOQJQ2qNIq4A6_0bTw 5 1 0 0 1.2kb 650b
From the output you will find that we have 5 primary shards and 1 replica shard, with 0 documents in our index and that our cluster is in a green state, meaning that our primary and replica shards has been assigned to the nodes in our cluster.
Note that a replica shard will NEVER reside on the same node as the primary shard for HA and Redundancy.
Let’s go a bit deeper and have a look at the shards, to see how our shards are distributed through our cluster, using the /_cat/shards
$ curl -XGET http://localhost:9200/_cat/shards?v
index shard prirep state docs store ip node
myfirstindex 1 r STARTED 0 230b es-node-2
myfirstindex 1 p STARTED 0 230b es-node-3
myfirstindex 4 p STARTED 0 230b es-node-3
myfirstindex 4 r STARTED 0 230b es-node-1
myfirstindex 2 r STARTED 0 230b es-node-2
myfirstindex 2 p STARTED 0 230b es-node-1
myfirstindex 3 p STARTED 0 230b es-node-2
myfirstindex 3 r STARTED 0 230b es-node-3
myfirstindex 0 p STARTED 0 230b es-node-2
myfirstindex 0 r STARTED 0 230b es-node-1
As you can see each replica shard of it’s primary are spread on different nodes.
Replicating a Yellow Cluster Status
For a yellow cluster status we know that it’s when one or more replica shards are in a unassigned state.
So let’s replicate that behavior by scaling our replica count to 3, which would mean that 5 replica shards will be in a unassigned state:
$ curl -XPUT -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_settings -d \
'{"number_of_replicas": 3}'
Now we have scaled the replica count to 3, but since we only have 3 nodes, we will have a yellow state cluster:
$ curl -XGET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size
yellow open myfirstindex xSX9nOQJQ2qNIq4A6_0bTw 5 3 0 0 3.3kb 1.1kb
The cluster health status should show the number of unassigned shards, and while they are unassigned we can verify that by looking at the shards API again:
$ curl -XGET http://localhost:9200/_cat/shards?v
index shard prirep state docs store ip node
myfirstindex 1 r STARTED 0 230b es-node-2
myfirstindex 1 p STARTED 0 230b es-node-3
myfirstindex 1 r STARTED 0 230b es-node-1
myfirstindex 1 r UNASSIGNED
myfirstindex 4 r STARTED 0 230b es-node-2
myfirstindex 4 p STARTED 0 230b es-node-3
myfirstindex 4 r STARTED 0 230b es-node-1
myfirstindex 4 r UNASSIGNED
myfirstindex 2 r STARTED 0 230b es-node-2
myfirstindex 2 r STARTED 0 230b es-node-3
myfirstindex 2 p STARTED 0 230b es-node-1
myfirstindex 2 r UNASSIGNED
myfirstindex 3 p STARTED 0 230b es-node-2
myfirstindex 3 r STARTED 0 230b es-node-3
myfirstindex 3 r STARTED 0 230b es-node-1
myfirstindex 3 r UNASSIGNED
myfirstindex 0 p STARTED 0 230b es-node-2
myfirstindex 0 r STARTED 0 230b es-node-3
myfirstindex 0 r STARTED 0 230b es-node-1
myfirstindex 0 r UNASSIGNED
At this point in time, we could either add another node to the cluster or scale the replication factor back to 1 to get the cluster health to green again.
I will scale it back down to a replication factor of 1:
$ curl -XPUT http://localhost:9200/myfirstindex/_settings -d '{"number_of_replicas": 1}'
Ingest Data into Elasticsearch
We will ingest 3 documents into our index, this will be a simple document consisting of a name, country and gender, for example:
"name": "james",
"country": "south africa",
"gender": "male"
First we will ingest the document using a PUT HTTP method, when using a PUT method, we need to specify the document ID.
PUT methods will be use to create or update a document. For creating:
$ curl -XPUT -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/1 -d '
{"name":"james", "country":"south africa", "gender": "male"}'
Now you will find we have one index in our cluster:
$ curl -XGET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size
green open myfirstindex xSX9nOQJQ2qNIq4A6_0bTw 5 1 1 0 11.3kb 5.6kb
Since we know that the document ID is “1”, we can do a GET on the document ID to read the document from the index:
$ curl -XGET http://localhost:9200/myfirstindex/people/1?pretty
"_index" : "myfirstindex",
"_type" : "people",
"_id" : "1",
"found" : false
If we ingest documents with a POST request, Elasticsearch generates the document ID for us automatically. Let’s create 2 documents:
$ curl -XPOST -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/ -d '
{"name": "kevin", "country": "new zealand", "gender": "male"}'
$ curl -XPOST -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/ -d '
{"name": "sarah", "country": "ireland", "gender": "female"}'
When we have a look again at our index, we can see that we now have 3 documents in our index:
$ curl -XGET http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size
green open myfirstindex xSX9nOQJQ2qNIq4A6_0bTw 5 1 3 0 29kb 14.5kb
Search Queries
Now that we have 3 documents in our elasticsearch index, let’s explore the search API’s to get data from our index. First, let’s search for the keyword “sarah” as a source query parameter:
$ curl -XGET 'http://localhost:9200/myfirstindex/_search?q=sarah&pretty'
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
"_index" : "myfirstindex",
"_type" : "_doc",
"_id" : "cvU96GsBP0-G8XdN24s4",
"_score" : 0.2876821,
"_source" : {
"name" : "sarah",
"country" : "ireland",
"gender" : "female"
We can also narrow our search query down to a specific field, for example, show me all the documents with the name kevin:
$ curl -XGET 'http://localhost:9200/myfirstindex/_search?q=name:kevin&pretty'
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
"_index" : "myfirstindex",
"_type" : "_doc",
"_id" : "gPU96GsBP0-G8XdNHoru",
"_score" : 0.2876821,
"_source" : {
"name" : "kevin",
"country" : "new zealand",
"gender" : "male"
With Elasticsearch we can also search with our query in the request body, a similar query as above would look like this:
$ curl -XPOST -H 'Content-Type: application/json' \
'http://localhost:9200/myfirstindex/_search?pretty' -d '
"query": {
"match": {
"name": "kevin"
"_index" : "myfirstindex",
"_source" : {
"name" : "kevin",
"country" : "new zealand",
"gender" : "male"
We can use wildcard queries:
$ curl -XPOST -H 'Content-Type: application/json' \
'' -d '
"query": {
"wildcard": {
"country": "*land"
"hits" : [
"_index" : "myfirstindex",
"_type" : "_doc",
"_id" : "cvU96GsBP0-G8XdN24s4",
"_score" : 1.0,
"_source" : {
"name" : "sarah",
"country" : "ireland",
"gender" : "female"
"_index" : "myfirstindex",
"_type" : "_doc",
"_id" : "gPU96GsBP0-G8XdNHoru",
"_score" : 1.0,
"_source" : {
"name" : "kevin",
"country" : "new zealand",
"gender" : "male"
Have a look at their documentation for more information on the Search API
Delete your Index
To wrap this up, we will go ahead and delete our index:
$ curl -XDELETE http://localhost:9200/myfirstindex
Going Further
If this got you curious, then definitely have a look at this Elasticsearch Cheatsheet that I’ve put together and if you want to generate lot’s of data to ingest to your elasticsearch cluster, have a look at this python script.
Our other links related to ELK :
[…] How To Install an Elasticsearch Cluster on Ubuntu 18.04 […]
Hello Antoine,
Nice blog! I am editor at Java Code Geeks ( We have the JCG program (see, that I think you’d be perfect for.
If you’re interested, send me an email to [email protected] and we can discuss further.
Best regards,
Eleftheria Drosopoulou
Easy to understand and steps are very clearly mentioned in simple language.
[…] How To Install an Elasticsearch Cluster on Ubuntu 18.04 […]
Scaling it back down to a replication factor of 1 didn’t work in my case and the cluster remained in a yellow state with unassigned shards.
I’ve deleted and recreated my index to proceed with the guide.
Thank you for the useful information.
Doing “apt install software-properties-common python-software-properties apt-transport-https -y” says “Package python-software-properties is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source However the following packages replace it: software-properties-common”.
That “software-properties-common” is already in the command so perhaps “python-software-properties” just needs removing from the guide.