Contents

Automated elasticSearch rolling upgrades, using Ansible

Upgrading elasticSearch is a joy

Want to skip the waffly bit, skip to The code

Anyone who had the pleasure of upgrading an elasticSearch cluster knows what a tedious and sometimes eventful process it can be. Unfortunately, I’ve not yet had the experience of working on large scale clusters (20+ nodes) however my experience with slightly smaller clusters have highlighted the need to automate this task.

For those who don’t know, when you update an elasticSearch cluster, it’s best to perform what’s called a ‘rolling upgrade’. This involves a few steps:

  1. Disabling shard allocation and performing a synced flush
  2. Stopping the elasticSearch service one node
  3. Performing upgrades/maintenance/reboots
  4. Restarting the elasticSearch service/node
  5. Waiting for the elasticSearch node to join the cluster
  6. Re-enabling shard allocation
  7. Waiting for the cluster to return to green
  8. Repeat for every node!

Don’t get me wrong, I absolutely love elasticSearch! It the technology behind HoneypotDB’s data store and search, but when nodes take a long time to rebalance (step 7) it can take a while.

Ansible to the rescue

/17-automated-elasticsearch-rolling-updates/automate.jpeg
Automate all the things

Ansible is a really powerful tool. Put simply, it enables remote management of devices and automation of tasks, which sounds perfect for tackling the beast that is elasticSearch upgrades.

I’ve been working more and more with Ansible lately, I’m using it to automate the deployment of infrastructure for HoneypotDB’s and deploy new pots. Furthermore, all my OS and elasticSearch is automated, ran every week by Ansible Tower. It’s great!

The code

I’ve compiled the steps into a single Ansible role, and uploaded it to my GitHub for you to download. Please feel free to give it a try.

Do give the readme a read (lol), as it highlights some important information

The playbook will:

  1. Check the nodes installed elasticSearch version is lower that the target version
  2. if it is lower, disable shard allocation and perform a synced flush
  3. Stop the elasticSearch service
  4. Upgrade to the target version
  5. Reinstall the elasticSearch S3 snapshot plugin (Comment these steps out if you don’t need them)
  6. Restart the elasticSearch service
  7. Wait for the node to re-join the cluster
  8. Reenable shard allocation
  9. Wait for the cluster to return to green
  10. If Kibana is installed on the node, stop and upgrade Kibana then start it up again
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
- name: Wait for cluster health to return to green
  uri:
    url: https://{{ inventory_hostname }}:9200/_cluster/health
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: GET
  register: response
  until: "response.json.status == 'green'"
  retries: 50
  delay: 30

- name: Get cluster details
  uri:
    url: https://{{ inventory_hostname }}:9200/
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: GET
  register: es_cluster_details

- name: Disable shard allocation for the cluster
  uri:
    url: https://{{ inventory_hostname }}:9200/_cluster/settings
    body: '{ "persistent": { "cluster.routing.allocation.enable": "none" } }'
    body_format: json
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: PUT

- name: Perform synced flush
  uri:  
    url: https://{{ inventory_hostname }}:9200/_flush/synced
    user: ansible
    password: '{{ elastic_ansible_password }}'
    basic_auth: yes
    method: POST
  ignore_errors: yes

- name: Pause for 15 seconds for things to settle down
  pause:
    seconds: 15


- name: Stop elasticsearch service
  become: true
  systemd:
    name: elasticsearch
    enabled: yes
    state: stopped

- name: Remove S3 plugin from es
  become: true
  shell: /usr/share/elasticsearch/bin/elasticsearch-plugin remove repository-s3
  args:
    executable: /bin/bash
  ignore_errors: yes

- name: Update elasticsearch
  yum: 
    name:
    - elasticsearch-{{ es_target_version }}
    enablerepo: elasticsearch
    state: present
  vars:
    ansible_python_interpreter: /usr/bin/python

- name: Just force systemd to reread configs (2.4 and above)
  systemd:
    daemon_reload: yes

- name: Install S3 plugin from es
  become: true
  shell: yes 'y' | /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3
  args:
    executable: /bin/bash
  ignore_errors: yes

- name: Start elasticsearch
  register: response
  become: true
  systemd:
    name: elasticsearch
    enabled: yes
    state: started

- name: Wait for elasticsearch node to come back up
  wait_for:
    host: '{{ inventory_hostname }}'
    port: 9300
    delay: 30

- name: Pause for 15 seconds for things to settle down
  pause:
    seconds: 10

- name: Confirm the node joins the cluster
  uri:
    url: https://{{ inventory_hostname }}:9200/
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: GET
  register: es_post_cluster_state
  retries: 10
  delay: 20
  until: es_post_cluster_state.json.cluster_uuid == es_cluster_details.json.cluster_uuid

- name: Enable shard allocation for the cluster
  uri:
    url: https://{{ inventory_hostname }}:9200/_cluster/settings
    body: '{ "persistent": { "cluster.routing.allocation.enable": null } }'
    body_format: json
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: PUT
  register: response
  until: "response.json.acknowledged == true"
  retries: 5
  delay: 30

- name: Wait for cluster health to return to greenn
  uri:
    url: https://{{ inventory_hostname }}:9200/_cluster/health
    user: ansible
    password: '{{ elastic_ansible_password }}'
    force_basic_auth: yes
    method: GET
  register: response
  until: "response.json.status == 'green'"
  retries: 50
  delay: 30

Thanks for checking this post out, I hope it helped to to automate the elasticSearch upgrade process. Drop and Qs in the comments below ☺