Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore Tools #177

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
9f689aa
init kv features
arcsector Mar 18, 2023
2e10ee3
README and post_install changes
arcsector Mar 21, 2023
ce2c80a
Fixes for auth and disable
arcsector Mar 23, 2023
e30dc3a
Additional KVstore helpers and tasks
arcsector Mar 23, 2023
5b71f97
added a login task and included in kvstore related tasks
dtwersky Mar 29, 2023
abd4739
fixed missing tick in README.md
dtwersky Mar 29, 2023
2fe810b
fixed another typo in README.md
dtwersky Mar 29, 2023
1fbac54
become_user to splunk for login
dtwersky Mar 29, 2023
3263c35
kvstore tools fixes
arcsector Mar 31, 2023
099913b
become and changed_when:false for Get current SHCluster captain
dtwersky Mar 31, 2023
0abf12a
become and checked_when:false for Get current KVStore captain
dtwersky Mar 31, 2023
340a3b7
Using version var & cleaning upgrade conditionals
arcsector Apr 3, 2023
b7272dd
Merge branch 'feat-kv-migration' of github.com:arcsector/ansible-role…
arcsector Apr 3, 2023
360e7e4
created block for task. added become to whole block
dtwersky Apr 3, 2023
38a5c77
fixed splunk_authenticated typo. replaced command with shell
dtwersky Apr 3, 2023
ffe30c4
default values for when kvstore-status doesn't return serverVersion
arcsector Apr 28, 2023
dcaaa7f
Change Oplog size based on support recommendations
arcsector Mar 15, 2024
5061198
Check current oplog size against requested oplog size
arcsector Mar 19, 2024
2fa341d
auth for statuses
arcsector Mar 20, 2024
3f8d4d2
documenting oplog kv task
arcsector Jan 23, 2025
483f335
bring up-to-date with master
arcsector Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
403 changes: 206 additions & 197 deletions README.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions roles/splunk/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ splunk_general_key: undefined # Configures a pass4SymmKey in server.conf under t
splunk_ds_key: undefined # Configures a pass4SymmKey in server.conf for authenticating against a deployment server
splunk_admin_username: admin
splunk_admin_password: undefined # Use ansible-vault encrypt_string, e.g. ansible-vault encrypt_string --ask-vault-pass 'var_value_to_encrypt' --name 'var_name'
splunk_authenticated: false # DO NOT CHANGE. This fact is set to true in the `splunk_login.yml` task, and reset to false if `splunk_restart.yml`, `splunk_stop.yml` or `splunk_start.yml` are manually called in another task.
splunk_configure_secret: false # If set to true, you need to update files/splunk.secret
splunk_secret_file: splunk.secret # Used to specify your splunk.secret filename(s), files should be placed in the "files" folder of the role
# Although there are tasks for the following Splunk configurations in this role, they are not included in any tasks by default. You can add them to your install_splunk.yml if you would like to have Ansible manage any of these files
Expand Down Expand Up @@ -72,6 +73,10 @@ splunk_shc_target_group: shc
splunk_shc_deployer: "{{ groups['shdeployer'] | first }}" # If you manage multiple SHCs, configure the var value in group_vars
splunk_shc_uri_list: "{% for h in groups[splunk_shc_target_group] %}https://{{ hostvars[h].splunk_mgmt_uri }}:{{ splunkd_port }}{% if not loop.last %},{% endif %}{% endfor %}" # If you manage multiple SHCs, configure the var value in group_vars
start_splunk_handler_fired: false # Do not change; used to prevent unnecessary splunk restarts
splunk_enable_kvstore: true
splunk_kvstore_storage: undefined # Can be defined here or at the group_vars level - accepted values: "wiredTiger" or "undefined", which leaves as default
splunk_kvstore_version: undefined # Can be defined here or at the group_vars level - accepted values: 4.2 or "undefined", which leaves as default1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this variable used either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, the splunk_kvstore_version is unused - I'll add it to the conditionals for the bottom of the upgrade procedure

splunk_oplog_size: 1000 # Default for Splunk Enterprise - should be changed at the group_vars level only at the behest of Splunk support with special care taken
# Linux and scripting related vars
add_crashlog_script: false # Set to true to install a script and cron job to automatically cleanup splunk crash logs older than 7 days
add_diag_script: false # Set to true to install a script and cron job to automatically cleanup splunk diag files older than 30 days
Expand Down
28 changes: 28 additions & 0 deletions roles/splunk/tasks/adhoc_backup_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
- name: Adhoc KVStore Backup
block:
- name: Check if authenticated
include_tasks: splunk_login.yml
when: not splunk_authenticated

- name: Check if we're okay to backup
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk show kvstore-status | grep backupRestoreStatus | sed -r 's/\s+backupRestoreStatus : //g'
register: splunk_kvstore_pre_backup_status_out
until: "{{ splunk_kvstore_pre_backup_status_out.stdout }} == 'Ready'"

- name: Backup KVStore on desired host
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk backup kvstore {{ archive_name | default("") }}
register: splunk_kvstore_backup_out
changed_when: splunk_kvstore_backup_out.rc == 0
failed_when: splunk_kvstore_backup_out.rc != 0

- name: Check that backup has finished
ansible.builtin.shell: |
{{ splunk_home }}/bin/splunk show kvstore-status | grep backupRestoreStatus | sed -r 's/\s+backupRestoreStatus : //g'
register: splunk_kvstore_status_out
until: "{{ splunk_kvstore_status_out.stdout }} == 'Ready'"

become: true
become_user: "{{ splunk_nix_user }}"
116 changes: 116 additions & 0 deletions roles/splunk/tasks/adhoc_change_oplog_shc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
# oplog size should not be changed unless the oplog window is too small causing members to become stale - or is gradually shrinking.
# Do NOT use this on a standalone instance - oplog size does not matter for a standalone KV Store.
# Deployments should monitor the oplog window and react in time. If the window is already too small - KV Store may have to be re-crated with increased oplog size.

- name: Check if authenticated
include_tasks: splunk_login.yml
when: not splunk_authenticated

- name: Make sure we're in an SHC
ansible.builin.fail:
msg: "SHC not found in group_names - detected group names are \"{{ group_names }}\". This play will only run on an SHC"
when: splunk_shc_target_group not in group_names

- name: Get current oplog size
ansible.builtin.shell: |
{{ splunk_home }}/bin/splunk btool server list kvstore | grep oplogSize | sed 's/[^0-9]*//g'
register: current_oplog_size_out
failed_when: current_oplog_size_out.rc != 0
run_once: true
become: true
become_user: "{{ splunk_nix_user }}"

- name: Debug current OpLog Size in MB
ansible.builtin.debug:
var: current_oplog_size_out.stdout
verbosity: 1

- name: Make sure the oplog size var differs from our current value, if they're the same, exit play
ansible.builtin.meta: end_play
when: current_oplog_size_out == splunk_oplog_size

# sets fact splunk_shc_captain
- name: Find SHC Captain
include_tasks: get_shcluster_captain.yml

# GUID from SPLUNK_HOME/etc/instance.cfg, not just hostname
- name: Find KVStore Captain
block:
- name: Check if authenticated
include_tasks: splunk_login.yml
when: not splunk_authenticated

# Gets KVStore captain hostname - like splunk_captain.domain.com
# guid for GUID, hostAndPort for host with port - like `| sed -r 's/\s+hostAndPort : //g' | sed -r 's/:[0-9]+//g'`
- name: Get current KVStore captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk show kvstore-status | grep -B10 "KV store captain" | grep "guid" | sed -r 's/\s+guid : //g'
become: true
become_user: "{{ splunk_nix_user }}"
register: splunk_get_kvcaptain
changed_when: false
failed_when: splunk_get_kvcaptain.rc != 0

- name: Register KVStore captain fact
ansible.builtin.set_fact:
splunk_kv_captain_guid: "{{ splunk_get_kvcaptain.stdout }}"

- name: Make KVCaptain SHC captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk transfer shcluster-captain -mgmt_uri "https://{{ splunk_shc_captain }}:{{ splunkd_port }}"
register: transfer_captain_out
changed_when: transfer_captain_out.rc == 0
failed_when: transfer_captain_out.rc != 0

- name: Get current KVCaptain
include_tasks: get_kvstore_captain.yml

- name: Ensure SHC Captain and KV Captain are the same
include_tasks: get_shcluster_captain.yml
until: "'{{ splunk_shc_captain }}' == '{{ splunk_kv_captain }}'"
delay: 10
retries: 30

- name: Make a backup of the whole kvstore located in $SPLUNK_DB/kvstore only on one member
include_tasks: adhoc_backup_kvstore.yml
run_once: true # this works here to run the entire task only once, whereas import_tasks would run this on all hosts
vars:
archive_name: "{{ inventory_hostname }}-preoplog-backup"
#- name: Make a backup of the whole KVStore directory ($SPLUNK_DB/kvstore) on only one member
# ansible.builtin.shell: |
# echo "{{ splunk_home }}/{{ inventory_hostname }}-preoplog-backup.tar.gz"; tar -czf {{ splunk_home }}/{{ inventory_hostname }}-preoplog-backup.tar.gz {{ splunk_db_path }}/kvstore
# register: kvstore_backup_out
# changed_when: kvstore_backup_out.rc == 0
# failed_when: kvstore_backup_out.rc != 1
# become: true
# run_once: true

# Note - make this a separate task so that we can repeat it later for the final member
- name: For each of the other SHC cluster members - increase the oplog
include_tasks: adhoc_increase_oplog_helper.yml
when: splunk_shc_captain != inventory_hostname

- name: Select new node to be SHC captain
ansible.builtin.set_fact:
splunk_new_shc_captain: "{% for h in groups[splunk_shc_target_group] %}https://{{ hostvars[h].ansible_fqdn }}:{{ splunkd_port }}{% if not loop.last and hostvars[h].ansible_fqdn != splunk_shc_captain %},{% endif %}{% endfor %}" # If you manage multiple SHCs, configure the var value in group_vars


- name: Transfer SHC captain to a different node
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk transfer shcluster-captain -mgmt_uri "{{ splunk_new_shc_captain }}"
register: transfer_captain_out
changed_when: transfer_captain_out.rc == 0
failed_when: transfer_captain_out.rc != 0

- name: Increase oplog on final member
include_tasks: adhoc_increase_oplog_helper.yml
when: splunk_shc_captain == inventory_hostname

- name: Try to check data
ansible.builtin.debug:
msg:
- Check if the data is available - if something went wrong during the process
- use backup to restore the data. Backup is on this searchhead with this name={{ kvstore_backup_out.stdout }}
- If members are out of sync, resync the KVStore from the SHCluster captain {{ splunk_new_shc_captain }}
- with `splunk resync kvstore -source {{ splunk_kv_captain_guid }}`
13 changes: 13 additions & 0 deletions roles/splunk/tasks/adhoc_clean_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
- name: Stop Splunkd service
include_tasks: splunk_stop.yml

- name: Clean KVStore
ansible.builtin.command: "{{ splunk_home }}/bin/splunk clean kvstore --local --answer-yes"
become: true
become_user: "{{ splunk_nix_user }}"
register: clean_result
changed_when: clean_result.rc == 0
failed_when: clean_result.rc != 0
notify:
- start splunk
33 changes: 33 additions & 0 deletions roles/splunk/tasks/adhoc_increase_oplog_helper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
- name: Stop Splunk
include_tasks: splunk_stop.yml

- name: Clean KVStore
ansible.builtin.command: "{{ splunk_home }}/bin/splunk clean kvstore --local --answer-yes"
become: true
become_user: "{{ splunk_nix_user }}"
register: clean_result
changed_when: clean_result.rc == 0
failed_when: clean_result.rc != 0

- name: Edit server.conf to increase the oplogSize setting
community.general.ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: oplogSize
value: "{{ splunk_oplog_size }}"
owner: "{{ splunk_nix_user }}"
group: "{{ splunk_nix_group }}"
mode: 0644
become: true
become_user: "{{ splunk_nix_user }}"

- name: Start Splunk to trigger synchronisation
include_tasks: splunk_start.yml

# sets fact splunk_kvstore_status_json
- name: Verify synchronisation with show kvstore-status
include_tasks: get_kvstore_status.yml
until: "'{{ splunk_kvstore_status_json.status }}' == 'ready'"
delay: 10
retries: 30
28 changes: 28 additions & 0 deletions roles/splunk/tasks/configure_kvstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
- name: Disable KVStore if specified
arcsector marked this conversation as resolved.
Show resolved Hide resolved
include_tasks: disable_kvstore.yml
when: not splunk_enable_kvstore

- name: Configure initial KVStore storage engine in server.conf
community.general.ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: storageEngine
value: "{{ splunk_kvstore_storage }}"
owner: "{{ splunk_nix_user }}"
group: "{{ splunk_nix_group }}"
mode: 0644
become: true
when:
- splunk_kvstore_storage == "wiredTiger"
- splunk_enable_kvstore

- name: Configure initial KVStore oplog size in server.conf
community.general.ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: oplogSize
value: "{{ splunk_oplog_size }}"
become: true
become_user: "{{ splunk_nix_user }}"
when: splunk_enable_kvstore
6 changes: 4 additions & 2 deletions roles/splunk/tasks/disable_kvstore.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
---
- name: Disable KVStore
when: ansible_system == "Linux"
when:
- ansible_system == "Linux"
- not splunk_enable_kvstore
ini_file:
path: "{{ splunk_home }}/etc/system/local/server.conf"
section: kvstore
option: disabled
value: "true"
become: True
become_user: "{{ splunk_nix_user }}"
become_user: "{{ splunk_nix_user }}"
18 changes: 18 additions & 0 deletions roles/splunk/tasks/get_kvstore_captain.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
- name: Check if authenticated
include_tasks: splunk_login.yml
when: not splunk_authenticated

# Gets KVStore captain hostname - like splunk_captain.domain.com
- name: Get current KVStore captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk show kvstore-status | grep -B10 "KV store captain" | grep "hostAndPort" | sed -r 's/\s+hostAndPort : //g' | sed -r 's/:[0-9]+//g'
become: true
become_user: "{{ splunk_nix_user }}"
register: splunk_get_kvcaptain
changed_when: false
failed_when: splunk_get_kvcaptain.rc != 0

- name: Register KVStore captain fact
ansible.builtin.set_fact:
splunk_kv_captain: "{{ splunk_get_kvcaptain.stdout }}"
20 changes: 20 additions & 0 deletions roles/splunk/tasks/get_kvstore_status.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
# This file gets our KVStore status from our current member as JSON
- name: Get KVStore status
ansible.builtin.shell: |
set -o pipefail
{{ splunk_home }}/bin/splunk show kvstore-status -auth {{ splunk_auth }} \
| grep -A13 "This member:" \
| tail -n +2 | sed -Er 's/^\s+//g' \
| awk -F ' * : *' '{ printf "\"%s\":\"%s\",", $1, $2 }' \
| sed 's/,$/}/' | sed 's/^/{/'
register: get_splunk_kvstore_status_out
failed_when: get_splunk_kvstore_status_out.rc != 0
become: true
become_user: "{{ splunk_nix_user }}"

- name: Convert KVStore status to JSON
ansible.builtin.set_fact:
splunk_kvstore_status_json: "{{ get_splunk_kvstore_status_out.stdout_lines[0] | from_json }}"

# output: {"date":"Tue Jul 21 16:42:24 2016","dateSec":"1466541744.143000","disabled":"0","guid":"6244DF36-D883-4D59-AHD3-1354FCB4BL91","oplogEndTimestamp":"Tue Jul 21 16:41:12 2016","oplogEndTimestampSec":"1466541672.000000","oplogStartTimestamp":"Tue Jul 21 16:34:55 2016","oplogStartTimestampSec":"1466541295.000000","port":"8191","replicaSet":"splunkrs","replicationStatus":"KV store captain","standalone":"0","status":"ready"}
18 changes: 18 additions & 0 deletions roles/splunk/tasks/get_shcluster_captain.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
# Gets SHC captain management domain - like splunk_captain.example
- name: Check if authenticated
include_tasks: splunk_login.yml
when: not splunk_authenticated

- name: Get current SHCluster captain
ansible.builtin.command: |
{{ splunk_home }}/bin/splunk show shcluster-status | grep -A6 Captain | grep mgmt_uri | sed -r 's/\s+mgmt_uri : //g' | sed -Er 's/http(s)?:\/\///' | sed -Er 's/:[0-9]+//'
become: true
become_user: "{{ splunk_nix_user }}"
register: splunk_get_shcaptain
changed_when: false
failed_when: splunk_get_shcaptain.rc != 0

- name: Register SHCluster captain fact
ansible.builtin.set_fact:
splunk_shc_captain: "{{ splunk_get_shcaptain.stdout }}"
20 changes: 20 additions & 0 deletions roles/splunk/tasks/get_shcluster_status.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
# This file gets our shcluster status from our current member as JSON
- name: Get shcluster status
ansible.builtin.shell: |
set -o pipefail
{{ splunk_home }}/bin/splunk show shcluster-status -auth {{ splunk_auth }} \
| grep -A5 " {{ inventory_hostname }}" \
| tail -n +2 | sed -Er 's/^\s+//g' \
| awk -F ' * : *' '{ printf "\"%s\":\"%s\",", $1, $2 }' \
| sed 's/,$/}/' | sed 's/^/{/'
register: get_splunk_shcluster_status_out
failed_when: get_splunk_shcluster_status_out.rc != 0
become: true
become_user: "{{ splunk_nix_user }}"

- name: Convert shcluster status to JSON
ansible.builtin.set_fact:
splunk_shcluster_status_json: "{{ get_splunk_shcluster_status_out.stdout_lines[0] | from_json }}"

# output: {"label":"splunk-search.example.com","last_conf_replication":"Fri Mar 14 11:12:17 2024","mgmt_uri":"https://splunk-search.example.com:8089","mgmt_uri_alias":"https://10.1.1.5:8089","status":"Up"}
Loading