Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/bar 120 attempt 2 #868

Merged
merged 6 commits into from
Dec 13, 2023
Merged

Dev/bar 120 attempt 2 #868

merged 6 commits into from
Dec 13, 2023

Conversation

mikewallace1979
Copy link
Contributor

@mikewallace1979 mikewallace1979 commented Nov 7, 2023

This is functionally-equivalent to #867 however the different conninfo strings for WAL streaming are managed before creating the server.

The general idea is that:

  1. barman backup and most other barman commands create barman.server.Server objects using the conninfo and streaming_conninfo in the configuration as usual.
  2. barman receive-wal and barman cron create barman.server.Server objects using wal_conninfo and wal_streaming_conninfo if they are set but useconninfo and streaming_conninfo otherwise.
  3. barman replication-status will use either, depending on the value of its --source option (which defaults to the backup conninfo, to preserve existing behaviour).

This works quite nicely however:

  1. We need to be sure this is a safe thing to do. Might barman cron need to use the backup-host conninfo strings for some reason? Are there other barman commands which should have the WAL conninfo?
  2. barman check still needs work because it is checking the complete configuration of the Barman server.

Point 2 is addressed by the second commit in this PR where we make barman check get the WAL streaming status and carry out the relevant checks using that.

For point 1 we can do some careful review.

Overall this seems cleaner than #867 and does not require the scattergun changes to barman.server.Server which are present in that PR.

Related: Integration tests.

Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR definitely looks better than the previous one.
I'm posting a few thoughts.

barman/cli.py Outdated Show resolved Hide resolved
barman/cli.py Show resolved Hide resolved
barman/cli.py Show resolved Hide resolved
barman/cli.py Outdated Show resolved Hide resolved
barman/postgres.py Outdated Show resolved Hide resolved
barman/server.py Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
barman/server.py Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
@mikewallace1979
Copy link
Contributor Author

This PR definitely looks better than the previous one. I'm posting a few thoughts.

Thanks for reviewing the draft PR @barthisrael - I implemented all of your suggestions so far 😄

There will be a few follow additional commits with unit tests and docs, plus any other changes we determine we need as we continue to think about this patch.

@mikewallace1979 mikewallace1979 force-pushed the dev/bar-120-attempt-2 branch 4 times, most recently from b6b6afe to a4967cc Compare December 1, 2023 12:16
@mikewallace1979 mikewallace1979 marked this pull request as ready for review December 1, 2023 12:42
Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this PR.
I posted a few suggestions, mostly related with docstring syntax, but also some related with docs, tests, and a possibly alternative approach on the get_wal_conninfo function, if you think that's relevant.

barman/cli.py Outdated Show resolved Hide resolved
barman/cli.py Outdated Show resolved Hide resolved
barman/cli.py Outdated Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
barman/server.py Outdated Show resolved Hide resolved
tests/test_server.py Outdated Show resolved Hide resolved
doc/manual/23-wal_streaming.en.md Outdated Show resolved Hide resolved
doc/manual/23-wal_streaming.en.md Outdated Show resolved Hide resolved
doc/manual/23-wal_streaming.en.md Outdated Show resolved Hide resolved
barthisrael added a commit that referenced this pull request Dec 5, 2023
This commit adds documentation for all the features that have been as part of
the work of making it easier to integrate Barman with HA clusters.

We perform the following changes to the web docs / `man`:

* Add `cluster` configuration option to `man 5`
* Add `model` configuration option to `man 5`
* Modify scope of the following configuration options in `man 5` so they
  include `model` scope:
    * `primary_conninfo`
    * `conninfo`
    * `streaming_conninfo`
* Add information about configuration models under the
  `Integration with cluster management systems` section of the web docs
* Add information about configuration models and examples under the
  `Configuration` section of the web docs
* Add documentation about the `barman config-switch` new CLI command in
  the web docs

Some changes to the documentation related with `wal_conninfo` and
`wal_streaming_conninfo` have already been applied through #868, and
are only pending to be merged.

References: BAR-136.

Signed-off-by: Israel Barth Rubio <[email protected]>
barthisrael added a commit that referenced this pull request Dec 5, 2023
This commit adds documentation for all the features that have been as part of
the work of making it easier to integrate Barman with HA clusters.

We perform the following changes to the web docs / `man`:

* Add `cluster` configuration option to `man 5`
* Add `model` configuration option to `man 5`
* Modify scope of the following configuration options in `man 5` so they
  include `model` scope:

    * `primary_conninfo`
    * `conninfo`
    * `streaming_conninfo`

* Add information about configuration models under the
  `Integration with cluster management systems` section of the web docs
* Add information about configuration models and examples under the
  `Configuration` section of the web docs
* Add documentation about the `barman config-switch` new CLI command in
  the web docs

Some changes to the documentation related with `wal_conninfo` and
`wal_streaming_conninfo` have already been applied through #868, and
are only pending to be merged.

References: BAR-136.

Signed-off-by: Israel Barth Rubio <[email protected]>
Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly docstring suggestions. Although I have a question which might be relevant regarding "check" on WAL streaming.

barman/config.py Outdated
Comment on lines 823 to 933
Returns the value of wal_streaming_conninfo and wal_conninfo if they
are set in the configuration. If wal_conninfo is unset then it will
be given the value of wal_streaming_conninfo. If wal_streaming_conninfo
is unset then ``None`` will be returned for both values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns the value of wal_streaming_conninfo and wal_conninfo if they
are set in the configuration. If wal_conninfo is unset then it will
be given the value of wal_streaming_conninfo. If wal_streaming_conninfo
is unset then ``None`` will be returned for both values.
Returns the value of ``wal_streaming_conninfo`` and ``wal_conninfo`` if they
are set in the configuration. If ``wal_conninfo`` is unset then it will
be given the value of ``wal_streaming_conninfo``. If ``wal_streaming_conninfo``
is unset then ``None`` will be returned for both values.

I forgot to add this suggestion earlier for the literals.

barman/config.py Outdated
Comment on lines 828 to 831
:rtype: (str,str)|(None,None)
:return: Tuple consisting of the ``wal_streaming_conninfo`` and
``wal_conninfo`` defined in the configuration if ``wal_streaming_conninfo``
is set, a tuple of ``(None, None)`` otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:rtype: (str,str)|(None,None)
:return: Tuple consisting of the ``wal_streaming_conninfo`` and
``wal_conninfo`` defined in the configuration if ``wal_streaming_conninfo``
is set, a tuple of ``(None, None)`` otherwise.
:rtype: (str,str)
:return: Tuple consisting of the ``wal_streaming_conninfo`` and
``wal_conninfo`` defined in the configuration if ``wal_streaming_conninfo``
is set, a tuple of ``streaming_conninfo`` and ``conninfo`` otherwise.

barman/server.py Outdated

def _check_wal_level(self, check_strategy, remote_status, suffix=None):
"""
Check whether the remote status indicates wal_level is correct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check whether the remote status indicates wal_level is correct.
Check whether the remote status indicates ``wal_level`` is correct.

Copy link
Contributor

@barthisrael barthisrael Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering: shouldn't we check the WAL streaming connection on cron_receive_wal before spawning pg_receivewal?

I mean, something similar to what we do in backup method when it calls the check method. Maybe check_wal_streaming could be enough for cron_receive_wal.

I'm mainly concerned about checking the server identity before spawning the receiver.

What do you think?

Copy link
Contributor Author

@mikewallace1979 mikewallace1979 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great suggestion - between these new config vars and the config models, the number of places where a user could accidentally point to the wrong server is only increasing, so if we can check for that we absolutely should.

Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestions based on the work performed through this commit in the docs PR: c84dbb4.

receiving WALs. If left unset then Barman will use the connection string
defined by `wal_streaming_conninfo`. If `wal_conninfo` is set but
`wal_streaming_conninfo` is unset then `wal_conninfo` will be ignored.
Server.
Copy link
Contributor

@barthisrael barthisrael Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Server.
Scope: Server/Model.

: A connection string which, if set, will be used by Barman to connect to
the Postgres server when receiving WAL segments via the streaming
replication protocol. If left unset then Barman will use the connection
string defined by `streaming_conninfo` for receiving WAL segments. Server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
string defined by `streaming_conninfo` for receiving WAL segments. Server.
string defined by `streaming_conninfo` for receiving WAL segments.
Scope: Server/Model.

barthisrael added a commit that referenced this pull request Dec 11, 2023
This PR adds documentation for all the features that have been as part
of the work of making it easier to integrate Barman with HA clusters.

We perform the following changes to the web docs / `man`:

* Add `cluster` configuration option to `man 5`
* Add `model` configuration option to `man 5`
* Include the `Model` scope to all options that configuration models support
* Add information about configuration models under the `Integration with cluster
  management systems` section of the web docs
* Add information about configuration models and examples under the `Configuration`
  section of the web docs
* Add documentation about the `barman config-switch` new CLI command in the
  web docs

Some changes to the documentation related with `wal_conninfo` and
`wal_streaming_conninfo` have already been applied through #868, and are only
pending to be merged.

Besides that we applied some standardization on all configuration options in `man`
(`doc/barman.5.d/50-*` files):

* Have an empty new line at the end -- some of them were missing this;
* Have at most 80 characters per line -- some of them were not following
  this rule;
* Have a `Scope: ` line near the end, after the description of the
  configuration option: it specifies which is the scope of the config
  option, i.e., if it can be applied to one or more scopes among
  `Global`, `Server` and `Model`. This information used to be present
  in the `man 5` files already, however they were part of the descritpion.
  Now we have some emphasis with a separate paragraph which follows the
  same standard over all files.

Also, there are more docs to come, once we finish other work related with HA cluster
features which is still pending.

References: BAR-136.
barman/config.py Outdated
Returns the value of ``wal_streaming_conninfo`` and ``wal_conninfo`` if they
are set in the configuration. If ``wal_conninfo`` is unset then it will
be given the value of ``wal_streaming_conninfo``. If ``wal_streaming_conninfo``
is unset then ``None`` will be returned for both values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is unset then ``None`` will be returned for both values.
is unset then fall back to ``streaming_conninfo`` and ``conninfo``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - good catch!

Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted only a small side comment about a docstring, but the PR looks great to me :)

This commit makes it possible to stream WALs from a different
PostgreSQL instance than the instance used as the source for backups.

Two new configuration variables are added:

- wal_streaming_conninfo: A conninfo string which will be used by
  Barman to make streaming replication connections when receiving
  WALs.
- wal_conninfo: A conninfo string which will be used by Barman to
  manage streaming replication connections for receiving WALs.

If `wal_conninfo` is not provided then Barman will attempt to use
`wal_streaming_conninfo` for managing WAL-related replicaiton.

If `wal_streaming_conninfo` is not provided then Barman will use
the existing `streaming_conninfo` and `conninfo` strings for receiving
WALs.

The implementation works by manipulating the configuration so that
Barman connects to the WAL source when running commands which concern
WAL streaming and connects to the backup source when running all other
commands. This results in a smaller set of changes than alternative
approaches such as making the `barman.server.Server` aware of multiple
PostgreSQL instances and choosing the correct one for any given
operation.

This approach works well for the `barman backup`, `barman cron` and
`barman receive-wal` commands however both the `barman check` and
`barman replication-status` commands do need some awareness of both
connections.

For `barman replication-status`, we add a new option `--source` which
can be set to either `wal-host` or `backup-host`; this option determines
which host is queried for replication status information. The default
is `backup-host` to remain compliant with the previous behaviour.

The changes required to `barman check` (and the checks performed during
`barman backup`) are made in a subsequent commit.

Relates to BAR-120.
Updates the behaviour of `barman.server.Server.check` (called during
`barman backup` and `barman check` operations) so that additional
checks are made on the connection used for streaming WALs *if* a
specific conninfo is provided for WAL streaming.

If no specific conninfo is provided for streaming WALs then the
behaviour is unchanged and all checks are carried out on the connections
specified in conninfo and streaming_conninfo.

If wal_streaming_conninfo is provided then the following checks are
carried out against the PostgreSQL host used for WAL streaming:

- "has monitoring privileges": Checks that the connection used to
  manage WAL streaming has permissions to retrieve replication slot
  information.
- "PostgreSQL streaming": Checks that the connection used for streaming
  WALs is able to connect via streaming replication.
- "wal_level": Checks that the wal_level is set to a value which allows
  physical replication on the WAL streaming host.
- "systemid coherence": Checks the system identifier of the PostgreSQL
  instance used for WAL streaming matches the system identifier known
  to Barman.
- "replication slot": Checks that the named replication slot is present
  and active on the PostgreSQL instance used for WAL streaming.

All but the last check will have already been carried out using the
conninfo and streaming_conninfo connections however it is important that
they are also performed using wal_streaming_conninfo and wal_conninfo
so that any issues with the PostgreSQL host used for WAL streaming can
be identified.

When these checks are performed for WAL streaming purposes the suffix
"(WAL streaming)" is appended to the check name.

The implementation involves extracting the logic for streaming checks
so that they can be called in isolation using remote status information
obtained from the WAL streaming connections. The following new functions
are created as a result of this:

- _check_streaming_supported
- _check_wal_level
- _check_replication_slot

In addition the existing check_identity function is updated to accept
a remote_status and suffix.

A new check, _check_has_monitoring_privileges, is also added, so that
if sufficient permissions to check the replication slot status are not
available then it will fail as a specific check rather than only as an
error case of the relpication slot check.

Relates to BAR-120.
Fixes the unit tests which were broken by adding the ability to stream
WALs from a different host to the backup source.

Relates to BAR-120.
Adds unit tests which verify the new behaviours when
wal_streaming_conninfo (and related wal_conninfo) are set.

Relates to BAR-120.
This patch runs the WAL streaming checks during
`barman.server.Server.receive_wal` before the streaming archiver is
invoked. This allows us to catch errors such as a lack of monitoring
privileges, mismatching system IDs and incorrect WAL levels.

This particular location in the code was chosen because it means the
checks are carried out when both `barman cron` and `barman receive-wal`
are executed. In the `barman cron` case the `barman receive-wal` child
process will fail leaving a trail of errors in the log but `barman cron`
itself will continue and exit with status zero. This is in line with
existing cases where `barman receive-wal` will fail for other reasons
such as issues with the replication slot.

To implement this we have to separate the WAL streaming checks into
those which should be run before WAL streaming starts (aka "preflight"
checks) and those which should be run to verify WAL streaming is working
(currently just the replication slot checks). This means we can then
call the preflight checks during `barman.server.Server.receive_wal` and
avoid failing due to the replication slot not being active.

Relates to BAR-120.
Copy link

edb-sonar-app bot commented Dec 13, 2023

@mikewallace1979 mikewallace1979 merged commit 315bdc8 into master Dec 13, 2023
8 checks passed
@mikewallace1979 mikewallace1979 deleted the dev/bar-120-attempt-2 branch December 13, 2023 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants