Unable to switch from replica ( promoted as master when master had failed) to replica-2 when replica is failed #58

GajaHebbar · 2018-08-06T10:42:01Z

using primary-replica create pr-primary, pr-replica and pr-replica-2 pods
run watch to monitor and switch master in case of failure.

Kill the pr-primary. watcher identifies the master failure and promotes pr-replica as master.

After this we can insert/delete database entries ( working file as expected)

now kill the pr-replica ( labelled as pr-primary after original pr-primary is killed)
watcher does not initiate failover
watcher logs for 1st failover(successful) and 2nd failover (does not failover)

INFO[2018-08-06T10:25:14Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:25:44Z] Health Checking: 'pr-primary'
ERRO[2018-08-06T10:25:54Z] dial tcp 10.96.29.55:5432: i/o timeout
ERRO[2018-08-06T10:25:54Z] Could not reach 'pr-primary' (Attempt: 1)
INFO[2018-08-06T10:25:54Z] Executing pre-hook: /hooks/watch-pre-hook
INFO[2018-08-06T10:25:54Z] Processing Failover: Strategy - latest
INFO[2018-08-06T10:25:54Z] Deleting existing primary...
INFO[2018-08-06T10:25:54Z] Deleted old primary
INFO[2018-08-06T10:25:54Z] Choosing failover replica...
INFO[2018-08-06T10:25:54Z] Chose failover target (pr-replica)
INFO[2018-08-06T10:25:54Z] Promoting failover replica...
DEBU[2018-08-06T10:25:54Z] executing cmd: [/opt/cpm/bin/promote.sh] on pod pr-re plica in namespace default container: postgres
INFO[2018-08-06T10:25:54Z] Relabeling failover replica...
DEBU[2018-08-06T10:25:54Z] label: name
DEBU[2018-08-06T10:25:54Z] label: replicatype
INFO[2018-08-06T10:25:54Z] Executing post-hook: /hooks/watch-post-hook
INFO[2018-08-06T10:26:24Z] Health Checking: 'pr-primary'
INFO[2018-08-06T10:26:24Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:26:54Z] Health Checking: 'pr-primary'
INFO[2018-08-06T10:26:54Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:27:24Z] Health Checking: 'pr-primary'
INFO[2018-08-06T10:27:24Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:27:54Z] Health Checking: 'pr-primary'
INFO[2018-08-06T10:27:54Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:28:24Z] Health Checking: 'pr-primary'
INFO[2018-08-06T10:28:24Z] Successfully reached 'pr-primary'
INFO[2018-08-06T10:28:54Z] Health Checking: 'pr-primary'
ERRO[2018-08-06T10:29:04Z] dial tcp 10.96.29.55:5432: i/o timeout
ERRO[2018-08-06T10:29:04Z] Could not reach 'pr-primary' (Attempt: 1)
INFO[2018-08-06T10:29:34Z] Health Checking: 'pr-primary'
ERRO[2018-08-06T10:29:44Z] dial tcp 10.96.29.55:5432: i/o timeout
ERRO[2018-08-06T10:29:44Z] Could not reach 'pr-primary' (Attempt: 1)
INFO[2018-08-06T10:30:14Z] Health Checking: 'pr-primary'
ERRO[2018-08-06T10:30:24Z] dial tcp 10.96.29.55:5432: i/o timeout
ERRO[2018-08-06T10:30:24Z] Could not reach 'pr-primary' (Attempt: 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to switch from replica ( promoted as master when master had failed) to replica-2 when replica is failed #58

Unable to switch from replica ( promoted as master when master had failed) to replica-2 when replica is failed #58

GajaHebbar commented Aug 6, 2018

Unable to switch from replica ( promoted as master when master had failed) to replica-2 when replica is failed #58

Unable to switch from replica ( promoted as master when master had failed) to replica-2 when replica is failed #58

Comments

GajaHebbar commented Aug 6, 2018