new check: streaming_delay #206

tobixen · 2018-11-07T21:55:33Z

measuring how much a slave server is lagging behind its master.

This one is a bit similar to the existing streaming_delta check, except the delta check is supposed to be run from the master, and it measures the data delta in bytes, not the time delta in seconds. In some settings one would probably like to get a quick alert if the slave is significantly lagged compared to the master, even if the data delta size is small.

(requested by a customer of ours)

…ing behind its master

rjuju · 2019-04-22T13:47:45Z

Hi,

I just looked at this PR. Actually, we didn't implement it previously because the underlying function is not really helpful. It unfortunately doesn't return the replication lag, but the time since some data has been received and replayed. So if there's no write activity on the primary server, this service will probably trigger some false errors. If we were to accept this check, it'd have to be renamed to something like "received_activity_from_primary" or something that makes clear what's it's actually checking.

About the code:

there's no pg_last_wal_replay_timestamp function in postgres, pg_last_xact_replay_timestamp has never been renamed AFAIK
you document that UNKNOWN is returned if used on primary (and stand-alone server, but I'm not sure of what it means), but that's not the case. You should probably check for pg_is_in_recovery()

ioguix · 2023-11-29T10:43:13Z

Hi,

So this PR has been softly rejected since 5 years already. Let's close it for good.

However, this feature request will be supported through some other means using *_lag fields that appeared in pg_stat_replication in v10. See #361 and future PR about it.

Cheers,

tobixen · 2023-11-29T11:30:14Z

Oh ... this one went under my radar. I don't have responsibility for any primary-slave postgresql setups at the moment, so this is no priority for me, but I still think it may be useful for primary-slave environments where continuous write activity is expected. If the slave is lagging behind, then something is most likely wrong. I could blow the dust of this one and rename it - but if nobody finds it useful, then let it be :-)

Krysztophe · 2023-11-29T11:51:51Z

Any reason why you want the service on the secondary rather than the primary? Are the write_lag and replay_lag from pg_stat_replication enough?

tobixen · 2023-11-29T11:58:13Z

Long time since I was playnig with this, but on the primary server you can only check that there exists a slave that is up-to-date. Theoretically things may be set up wrongly so that there is another slave connected to the master, so I think it's nice to check from the slave point of view that it's connected, too. I believe that in November 2018 I was playing around with disaster recovery situations where a slave was switched to master, new slaves were taken up or taken down, etc.

ioguix · 2023-11-29T16:40:31Z

Hi,

but on the primary server you can only check that there exists a slave that is up-to-date

No, from pg_stat_replication on the primary you can check that :

the specified secondaries are connected
how much data each of them are lagging behind (write_lsn,flush_lsn,replay_lsn, we use them)
how long they are lagging behing(write_lag,flush_lag,replay_lag, we don't use them... yet)

so the idea is to use the *_lag fields, report them in perfdata and allow to set thresholds on them.

Theoretically things may be set up wrongly so that there is another slave connected to the master

Yes, but you can setup check_pga to explicitly check the streaming to some specific standbies using their application_name + remote IP address, using eg: --slave 'thisstandbyname 10.20.30.40' --slave 'thisanotherstandbyname 10.20.30.41'

tobixen · 2023-11-29T16:49:56Z

I still feel it would be simpler and easier being able to check from a specific slave if it's connected to the master than to do it from the master, but I won't waste time arguing at that - and for the foreseeable future I'm not going to monitor any postgresql slave servers anyway :-)

ioguix · 2023-11-29T17:27:11Z

I still feel it would be simpler and easier being able to check from a specific slave if it's connected to the master than to do it from the master

Sure, it would be possible, why not. However, you can not check the lag from the standby point of view.

and for the foreseeable future I'm not going to monitor any postgresql slave servers anyway :-)

Thank you very much for your past discussions and contributions Tobias!

Cheers,

new check: streaming_delay, measuring how much a slave server is lagg…

7856835

…ing behind its master

tobixen force-pushed the feature_streaming_delta_time branch from 6f316f6 to 7856835 Compare November 7, 2018 21:56

improved documentation of check_streaming_delay a bit

edb98ab

ioguix added this to the release 2.5 milestone Jan 29, 2019

ioguix modified the milestones: release 2.5, release 2.6 Nov 3, 2020

ioguix closed this Nov 29, 2023

ioguix mentioned this pull request Nov 29, 2023

New metrics and thresholds for streaming delta #361

Open

Krysztophe mentioned this pull request Nov 30, 2023

Check replication from secondary point of view #363

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new check: streaming_delay #206

new check: streaming_delay #206

tobixen commented Nov 7, 2018

rjuju commented Apr 22, 2019

ioguix commented Nov 29, 2023

tobixen commented Nov 29, 2023 •

edited

Loading

Krysztophe commented Nov 29, 2023

tobixen commented Nov 29, 2023 •

edited

Loading

ioguix commented Nov 29, 2023

tobixen commented Nov 29, 2023

ioguix commented Nov 29, 2023

new check: streaming_delay #206

new check: streaming_delay #206

Conversation

tobixen commented Nov 7, 2018

rjuju commented Apr 22, 2019

ioguix commented Nov 29, 2023

tobixen commented Nov 29, 2023 • edited Loading

Krysztophe commented Nov 29, 2023

tobixen commented Nov 29, 2023 • edited Loading

ioguix commented Nov 29, 2023

tobixen commented Nov 29, 2023

ioguix commented Nov 29, 2023

tobixen commented Nov 29, 2023 •

edited

Loading

tobixen commented Nov 29, 2023 •

edited

Loading