Understanding primary & replica shards in Elasticsearch

Shishir Khandelwal
8 min readNov 26, 2022

--

In this article, we are going to walk through the process of creating an elasticsearch cluster comprising 3 nodes. The objective is to learn about -

  • Steps to create a highly available elastic search cluster.
  • See how new nodes can be added to a cluster.
  • Understand primary & replica shards
  • Get familiar with ways to debug an elastic search cluster.
  • Test out the cluster’s fault tolerance.

Notes:

  • I’ll be using AWS to provision my virtual machines. The virtual machines would be using RHEL 8 operating system so that people comfortable with GCP or Azure can also follow along.
  • The virtual machine I am using is a t2.large which has 8GB of RAM and 10GB of external storage space.

Let’s begin!

Let’s begin by creating a single ES node.

Step 1: Set up a virtual machine and SSH into it.

  • An elasticsearch node uses two ports i.e. 9200 & 9300 for its functioning.
  • Port 9200 for API calls for search and aggregation requests. Any client trying to talk to an elasticsearch node or cluster uses this port for communications.
  • Port 9300 is used by all the elasticsearch nodes of a cluster for inter-node communications such as cluster upgrades, master elections, and joining or leaving a cluster.
  • Since these ports are required, they must be accessible on the virtual machine.

Step 2: Install & Configure elastic search on the virtual machine.

  • Update the OS
sudo yum update -y
  • Install ES
sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm"
  • Configure daemon
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
  • Configure ES

Open /etc/elasticsearch/jvm.options to configure the JVM heap sizes.

Acc. to the ES documentation, you should always set the min and max JVM heap size to the same value.

Note:
- Xms represents the initial size of total heap space.
- Xmx represents the maximum size of total heap space.

The heap size should be based on the available RAM. Set Xms and Xmx to no more than 50% of your total memory. Since the virtual machine being used here has 8GB ram, the Xms & Xmx should be 4GB.

-Xms4g
-Xmx4g

Next, let’s configure the elasticsearch.yml file. This file is used to specify all the important details such as cluster-name, node-names, bind address, bind ports, participating hosts of the cluster, eligible master nodes of the cluster etc.

The following are the configurations used in this case:

cluster.name: es-poc
node.name: es-poc-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.20.44
http.port: 9200
discovery.seed_hosts: [“52.7.233.151”]cluster.initial_master_nodes: [“52.7.233.151”]
  • The network host is the bind address of the virtual machine. It has been set to the private IP address of the virtual machine.
  • The seed hosts & the master nodes have the public address of the virtual machine.

Step 3: Start ES node.

sudo systemctl start elasticsearch.service

Step 4: Verify

The following commands can be used to verify the ES’s working. They are often used for debugging the ES cluster as well.

sudo systemctl status elasticsearch.service
curl http://172.31.20.44:9200

Great. We have an ES node up and running.

Let’s configure a kibana server now to populate some new indexes.

Step 1: Start a virtual machine (You can use a t2.micro instance) & SSH into it.

Step 2: Install Kibana

sudo yum update -y
sudo rpm -i https://artifacts.elastic.co/downloads/kibana/kibana-7.5.1-x86_64.rpm

Step 3: Configure Kibana: open /etc/kibana/kibana.yml and configure the elasticsearch host which kibana needs to connect to.

server.port: 5601
server.host: "172.31.16.11" //this is the private IP addr of kibana's virtual machine
elasticsearch.hosts: [“http://52.7.233.151:9200"]

Step 4: Start kibana

sudo systemctl start kibana

Once kibana starts, it will automatically populate some indexes on the ES node.

Step 5: Verify kibana working

sudo systemctl status kibana

Understanding primary & replica shards

  • Execute `curlhttp://172.31.20.44:9200/_cat/shards?v` to see the distribution of shards. You’ll notice that there are 0 replica shards. This is because there is only 1 ES node in the cluster.
  • When the no. of nodes increases, a replica shard would get created for each index.

Let’s increase the no. of ES nodes in the cluster now.

Adding the 2nd ES node

Provision a new virtual machine, open ports on it and install ES on it. Make the necessary changes to /etc/elasticsearch/jvm.options as mentioned above.

Since the new node is being added, the configurations of the ES1 node would also undergo a change i.e. a new seed_host & a new master eligible node would be added.

After changes, the elasticsearch.yml of node1 would have the following properties:

[root@ip-172-31-20-44 elasticsearch]# grep "^[^#]" elasticsearch.ymlcluster.name: es-poc
node.name: es-poc-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.20.44http.port: 9200discovery.seed_hosts: ["52.7.233.151","54.167.74.246"]cluster.initial_master_nodes: ["52.7.233.151","54.167.74.246"]

After changes, the elasticsearch.yml of node2 would have the following properties:

[root@ip-172-31-31-85 elasticsearch]# grep "^[^#]" elasticsearch.ymlcluster.name: es-poc
node.name: es-poc-node-2
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.31.85http.port: 9200discovery.seed_hosts: ["52.7.233.151","54.167.74.246"]cluster.initial_master_nodes: ["52.7.233.151","54.167.74.246"]

Here, 52.7.233.151 is the public IP address of node1 & 54.167.74.246 is the public IP address of node2.

After making the necessary changes, let’s restart both nodes.

sudo systemctl restart elasticsearch.service

Verify if the 2nd node got added to the cluster

[root@ip-172-31-31-85 elasticsearch]# curl http://172.31.31.85:9200/_cat/nodes?vip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name172.31.20.44             3          94   0    0.00    0.00     0.00 dilm      *     es-poc-node-1172.31.31.85             4          88   0    0.02    0.01     0.00 dilm      -      es-poc-node-2

Understanding primary & replica shards

  • Execute http://172.31.20.44:9200/_cat/shards?v to see the distribution of shards. You’ll notice that there are some replica shards in the list. This is because there are multiple nodes in the cluster now. Due to this, ES has automatically created a replica of each primary shard.
  • Also, notice that the primary & replica shards are always on different nodes.
  • When the no. of nodes increases further, the primary & replica shards would get redistributed again.

Let’s increase the no. of ES nodes in the cluster now. Again.

Adding the 3rd ES node

Follow the same steps from above. The following are the configurations in elasticsearch.yml for each node to add the 3rd ES node.

node1’s elasticsearch.yml

[root@ip-172–31–20–44 elasticsearch]# grep “^[^#]” elasticsearch.ymlcluster.name: es-poc
node.name: es-poc-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.20.44
http.port: 9200
discovery.seed_hosts: [“52.7.233.151”,”54.167.74.246",”18.206.210.61"]
cluster.initial_master_nodes: [“52.7.233.151”,”54.167.74.246",”18.206.210.61"]

node2’s elasticsearch.yml

[root@ip-172-31-31-85 elasticsearch]# grep "^[^#]" elasticsearch.ymlcluster.name: es-poc
node.name: es-poc-node-2
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.31.85
http.port: 9200
discovery.seed_hosts: ["52.7.233.151","54.167.74.246","18.206.210.61"]
cluster.initial_master_nodes: ["52.7.233.151","54.167.74.246","18.206.210.61"]

node3’s elasticsearch.yml

[root@ip-172-31-18-250 elasticsearch]# grep "^[^#]" elasticsearch.ymlcluster.name: es-poc
node.name: es-poc-node-3
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.31.18.250
http.port: 9200
discovery.seed_hosts: ["52.7.233.151","54.167.74.246","18.206.210.61"]
cluster.initial_master_nodes: ["52.7.233.151","54.167.74.246","18.206.210.61"]

Here,
52.7.233.151 is the public IP address of node1 &
54.167.74.246 is the public IP address of node2 &
18.206.210.61 is the public IP address of node3.

  • Next, restart the es process on all the nodes for the cluster to be created.

Verify if the 3rd node got added to the cluster

[root@ip-172-31-18-250 elasticsearch]# curl http://172.31.18.250:9200/_cat/nodes?vip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name172.31.18.250            3          88   0    0.00    0.00     0.00 dilm      *      es-poc-node-3172.31.20.44             4          94   0    0.00    0.00     0.00 dilm      -      es-poc-node-1172.31.31.85             6          88   0    0.00    0.00     0.00 dilm      -      es-poc-node-2

Understanding primary & replica shards

  • Execute http://172.31.20.44:9200/_cat/shards?v to see the distribution of shards. You’ll notice that the primary & replica shards' placement has undergone a change. This is because there are more nodes in the cluster now and hence ES has redistributed to increase the high availability & fault tolerance of the cluster.
  • Also, notice that the primary & replica shards are always on different nodes.
  • When the no. of nodes increases further, the primary & replica shards would get redistributed again.

Great. Till now we have understood the procedure to add a new node & also seen how ES automatically creates replica shards & manages the redistribution of primary & replica shards.

Let’s test the fault tolerance of the cluster now.

  • Observe that on es-poc-node-1, there exists a primary shard for “.kibana_task_manager_1” index and a replica shard for “.apm-agent-configuration” index.
  • If we stop the elasticsearch process on es-poc-node-1, what do you think would happen? Will the data of the primary shard on es-poc-node-1 become unavailable?
  • Let’s test it out.
  • SSH into es-poc-node-1 and stop the es process.
sudo systemctl stop elasticsearch
  • As you can see, the indexes got redistributed to other nodes. This means that even though one node went down — there was no loss of data.
  • This shows how the ES cluster is fault tolerant towards the loss of a node.
  • If you resume the elasticsearch process on es-poc-node-1, the shards would again get redistributed to ensure better high availability & fault tolerance of shards. Let’s see.
  • Let’s see the distribution of the shards after the node has rejoined the cluster.

As you can observe, the shards have undergone redistribution. So, now we have understood how primary and replica shards work to ensure elasticsearch’s high availability and fault tolerance.

Hope this was useful & clear!

--

--

Shishir Khandelwal

I spend my day learning AWS, Kubernetes & Cloud Native tools. Nights on LinkedIn & Medium. Work: Engineering @ PayPal.