Randomly Expressed

About

Welcome to my blog “randomly expressed”. I created this website to publish helpful tips. It’s mainly technology driven, but I will blog about other topics. I am a Unix sysadmin that is always looking to learn new things. My goal is to be able to share knowledge that others may find useful. xkcd.com

Continue Reading »

Contact

Connect With US

Connect with us on the following social networking sites.

Most Popular Posts.

Add Some Content to This Area

You should either deactivate this panel on the Theme Settings page, or add some content via the Widgets page in your WordPress dashboard.

elastic search unassigned shards

By on November 22, 2017 in Technology with No Comments


We are running an ELK stack for log collection and analysis. The Logstash and Kibana stack run on an EC2 instance. The Elastic Search is run on AWS’s elastic search service. This gives us the ability to scale the ES cluster when needed in an easy manner. Our AWS Elastic Search cluster changed to a yellow health status the other day. The following command shows the yellow status and gives more information.

root@ls01:~# curl -XGET  https://(es-endpoint)/_cluster/health?pretty
{
  "cluster_name" : "********:es01",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 13,
  "number_of_data_nodes" : 10,
  "active_primary_shards" : 151,
  "active_shards" : 300,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 2,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 99.33774834437085
}

Looking at the output from the curl command above, I noticed that the cluster had two unassigned shards which was the main reason it went to a yellow status. Now I needed to find out why the two shards became unassigned. The following command will give you more information.

curl -XGET https://(es-endpoint)/_cluster/allocation/explain?pretty
{
  "shard" : {
    "index" : "logstash-logs_2017.11.14",
    "index_uuid" : "lxeCX1l9S6Gvv1IRqQ8D5Q",
    "id" : 4,
    "primary" : false
  },
  "assigned" : false,
  "shard_state_fetch_pending" : false,
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2017-11-14T07:20:19.812Z",
    "failed_attempts" : 5,
    "delayed" : false,
 "final_explanation" : "the shard cannot be assigned because allocation deciders return a NO decision",

I snipped most of the output since it was way too long to post. Look for the final_explanation in the output, here we see that it could not be assigned due to a NO decision. The CPU graph for the ES cluster showed that CPU utilization spiked to 100% around the time the cluster health changed from green to yellow. This most likely means I need to increase the cores on the ES cluster by using a larger instance type. In order to fix the unassigned shards issue I needed to identify the yellow index.
Running the following command will identify the yellow index.

root@ls01:~# curl -XGET https://(es-endpoint)/_cat/indices?v
health status index                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   phperror-2017.11.17      W3BrmU2MS6WU3x8iGQsE2A   5   1     285894            0    314.6mb        157.3mb
green  open   logstash-logs_2017.11.18 fhOoybVuRJ-RG91Vn3N0oA   5   1  143124586            0    204.6gb        102.3gb
green  open   phperror-2017.11.20      P1uP9l6uQIywWtXN8S5YoQ   5   1     300737            0    330.6mb        165.3mb
green  open   phperror-2017.11.11      UZuZFHZ1RHipgeB7PW0qBg   5   1     124622            0    164.6mb         82.3mb
green  open   phperror-2017.11.21      6WChDm0DSoK9Pgl0kpcFVA   5   1     267903            0      332mb        166.1mb
green  open   logstash-logs_2017.11.22 q_-6wV7jS8W6oXS9RXaMKA   5   1   74279363            0      144gb         74.8gb
yellow open   logstash-logs_2017.11.14 lxeCX1l9S6Gvv1IRqQ8D5Q   5   1  192101980            0    215.1gb        134.4gb

You will see the yellow index name in the last line of the above output (logstash-logs_2017.11.14). You will need that name in order to disable it and enable it. Doing that will fix the issue. The following command will disable the replica index.

root@ls01:~# curl -XPUT 'https://(es-endpoint)/logstash-logs_2017.11.14/_settings?pretty' -H 'Content-Type: application/json' -d'
{
    "index" : {
        "number_of_replicas" : 0
    }
}
'

If the command completed successfully you see the following.

{
  "acknowledged" : true
}

Now that it’s disabled, let’s go and enable it with the following command. It’s pretty much the same command with the exception of changing the zero to a one.

root@ls01:~# curl -XPUT 'https://(es-endpoint)/logstash-logs_2017.11.14/_settings?pretty' -H 'Content-Type: application/json' -d'
{
    "index" : {
        "number_of_replicas" : 1
    }
}
'

The ES cluster will not turn green immediately, it will take some time about 20 minutes for me. Afterwards if you run a health status you will see it’s green and no longer has unassigned shards.

root@ls01:~# curl -XGET  https://(es-endpoint)/_cluster/health?pretty
{
  "cluster_name" : "******:es01",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 13,
  "number_of_data_nodes" : 10,
  "active_primary_shards" : 151,
  "active_shards" : 302,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
} 

Facebook Comments

Tagged With: , ,

Post a Comment

Your email address will not be published. Required fields are marked *

Top