Skip to content

Commit

Permalink
Merge pull request #442 from patroni/master
Browse files Browse the repository at this point in the history
Syncing from upstream patroni/patroni (master)
  • Loading branch information
bt-admin authored Aug 24, 2024
2 parents d5cc1d8 + 6d65aa3 commit 763183d
Show file tree
Hide file tree
Showing 30 changed files with 444 additions and 195 deletions.
5 changes: 3 additions & 2 deletions docs/dynamic_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ In order to change the dynamic configuration you can use either :ref:`patronictl
- **archive\_cleanup\_command**: cleanup command for standby leader
- **recovery\_min\_apply\_delay**: how long to wait before actually apply WAL records on a standby leader

- **member_slots_ttl**: retention time of physical replication slots for replicas when they are shut down. Default value: `30min`. Set it to `0` if you want to keep the old behavior (when the member key expires from DCS, the slot is immediately removed). The feature works only starting from PostgreSQL 11.
- **slots**: define permanent replication slots. These slots will be preserved during switchover/failover. Permanent slots that don't exist will be created by Patroni. With PostgreSQL 11 onwards permanent physical slots are created on all nodes and their position is advanced every **loop_wait** seconds. For PostgreSQL versions older than 11 permanent physical replication slots are maintained only on the current primary. The logical slots are copied from the primary to a standby with restart, and after that their position advanced every **loop_wait** seconds (if necessary). Copying logical slot files performed via ``libpq`` connection and using either rewind or superuser credentials (see **postgresql.authentication** section). There is always a chance that the logical slot position on the replica is a bit behind the former primary, therefore application should be prepared that some messages could be received the second time after the failover. The easiest way of doing so - tracking ``confirmed_flush_lsn``. Enabling permanent replication slots requires **postgresql.use_slots** to be set to ``true``. If there are permanent logical replication slots defined Patroni will automatically enable the ``hot_standby_feedback``. Since the failover of logical replication slots is unsafe on PostgreSQL 9.6 and older and PostgreSQL version 10 is missing some important functions, the feature only works with PostgreSQL 11+.

- **my\_slot\_name**: the name of the permanent replication slot. If the permanent slot name matches with the name of the current node it will not be created on this node. If you add a permanent physical replication slot which name matches the name of a Patroni member, Patroni will ensure that the slot that was created is not removed even if the corresponding member becomes unresponsive, situation which would normally result in the slot's removal by Patroni. Although this can be useful in some situations, such as when you want replication slots used by members to persist during temporary failures or when importing existing members to a new Patroni cluster (see :ref:`Convert a Standalone to a Patroni Cluster <existing_data>` for details), caution should be exercised by the operator that these clashes in names are not persisted in the DCS, when the slot is no longer required, due to its effect on normal functioning of Patroni.
Expand Down Expand Up @@ -92,7 +93,7 @@ Note: **slots** is a hashmap while **ignore_slots** is an array. For example:
type: physical
...
Note: if cluster topology is static (fixed number of nodes that never change their names) you can configure permanent physical replication slots with names corresponding to names of nodes to avoid recycling of WAL files while replica is temporary down:
Note: When running PostgreSQL v11 or newer Patroni maintains physical replication slots on all nodes that could potentially become a leader, so that replica nodes keep WAL segments reserved if they are potentially required by other nodes. In case the node is absent and its member key in DCS gets expired, the corresponding replication slot is dropped after ``member_slots_ttl`` (default value is `30min`). You can increase or decrease retention based on your needs. Alternatively, if your cluster topology is static (fixed number of nodes that never change their names) you can configure permanent physical replication slots with names corresponding to the names of the nodes to avoid slots removal and recycling of WAL files while replica is temporarily down:

.. code:: YAML
Expand All @@ -108,7 +109,7 @@ Note: if cluster topology is static (fixed number of nodes that never change the
.. warning::
Permanent replication slots are synchronized only from the ``primary``/``standby_leader`` to replica nodes. That means, applications are supposed to be using them only from the leader node. Using them on replica nodes will cause indefinite growth of ``pg_wal`` on all other nodes in the cluster.
An exception to that rule are permanent physical slots that match the Patroni member names, if you happen to configure any. Those will be synchronized among all nodes as they are used for replication among them.
An exception to that rule are physical slots that match the Patroni member names (created and maintained by Patroni). Those will be synchronized among all nodes as they are used for replication among them.


.. warning::
Expand Down
5 changes: 4 additions & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,10 @@ What is the difference between ``etcd`` and ``etcd3`` in Patroni configuration?
* API version 2 will be completely removed on Etcd v3.6.

I have ``use_slots`` enabled in my Patroni configuration, but when a cluster member goes offline for some time, the replication slot used by that member is dropped on the upstream node. What can I do to avoid that issue?
You can configure a permanent physical replication slot for the members.
There are two options:

1. You can tune ``member_slots_ttl`` (default value ``30min``, available since Patroni ``4.0.0`` and PostgreSQL 11 onwards) and replication slots for absent members will not be removed when the members downtime is shorter than the configured threshold.
2. You can configure permanent physical replication slots for the members.

Since Patroni ``3.2.0`` it is now possible to have member slots as permanent slots managed by Patroni.

Expand Down
2 changes: 1 addition & 1 deletion features/dcs_failsafe_mode.feature
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Feature: dcs failsafe mode
@dcs-failsafe
@slot-advance
Scenario: make sure permanent slots exist on replicas
Given I issue a PATCH request to http://127.0.0.1:8009/config with {"slots":{"postgres2":0,"dcs_slot_0":null,"dcs_slot_2":{"type":"logical","database":"postgres","plugin":"test_decoding"}}}
Given I issue a PATCH request to http://127.0.0.1:8009/config with {"slots":{"dcs_slot_0":null,"dcs_slot_2":{"type":"logical","database":"postgres","plugin":"test_decoding"}}}
Then logical slot dcs_slot_2 is in sync between postgres1 and postgres0 after 20 seconds
And logical slot dcs_slot_2 is in sync between postgres1 and postgres2 after 20 seconds
When I get all changes from physical slot dcs_slot_1 on postgres1
Expand Down
8 changes: 8 additions & 0 deletions features/nostream_node.feature
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,11 @@ Scenario: check permanent logical replication slots are not copied
Then "members/postgres2" key in DCS has replication_state=streaming after 10 seconds
And postgres1 does not have a replication slot named test_logical
And postgres2 does not have a replication slot named test_logical

@slot-advance
Scenario: check that slots are written to the /status key
Given "status" key in DCS has postgres0 in slots
And "status" key in DCS has postgres2 in slots
And "status" key in DCS has test_logical in slots
And "status" key in DCS has test_logical in slots
And "status" key in DCS does not have postgres1 in slots
22 changes: 15 additions & 7 deletions features/permanent_slots.feature
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Feature: permanent slots
Given I start postgres0
Then postgres0 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"slots":{"test_physical":0,"postgres0":0,"postgres1":0,"postgres3":0},"postgresql":{"parameters":{"wal_level":"logical"}}}
When I issue a PATCH request to http://127.0.0.1:8008/config with {"slots":{"test_physical":0,"postgres3":0},"postgresql":{"parameters":{"wal_level":"logical"}}}
Then I receive a response code 200
And Response on GET http://127.0.0.1:8008/config contains slots after 10 seconds
When I start postgres1
Expand Down Expand Up @@ -34,12 +34,14 @@ Feature: permanent slots
Scenario: check permanent physical slots that match with member names
Given postgres0 has a physical replication slot named postgres3 after 2 seconds
And postgres1 has a physical replication slot named postgres0 after 2 seconds
And postgres1 has a physical replication slot named postgres2 after 2 seconds
And postgres1 has a physical replication slot named postgres3 after 2 seconds
And postgres2 has a physical replication slot named postgres0 after 2 seconds
And postgres2 has a physical replication slot named postgres3 after 2 seconds
And postgres2 has a physical replication slot named postgres1 after 2 seconds
And postgres1 does not have a replication slot named postgres2
And postgres3 does not have a replication slot named postgres2
And postgres3 has a physical replication slot named postgres0 after 2 seconds
And postgres3 has a physical replication slot named postgres1 after 2 seconds
And postgres3 has a physical replication slot named postgres2 after 2 seconds

@slot-advance
Scenario: check that permanent slots are advanced on replicas
Expand All @@ -53,19 +55,25 @@ Feature: permanent slots
And Logical slot test_logical is in sync between postgres0 and postgres3 after 10 seconds
And Physical slot test_physical is in sync between postgres0 and postgres3 after 10 seconds
And Physical slot postgres1 is in sync between postgres0 and postgres2 after 10 seconds
And Physical slot postgres1 is in sync between postgres0 and postgres3 after 10 seconds
And Physical slot postgres3 is in sync between postgres2 and postgres0 after 20 seconds
And Physical slot postgres3 is in sync between postgres2 and postgres1 after 10 seconds
And postgres1 does not have a replication slot named postgres2
And postgres3 does not have a replication slot named postgres2

@slot-advance
Scenario: check that only permanent slots are written to the /status key
Scenario: check that permanent slots and member slots are written to the /status key
Given "status" key in DCS has test_physical in slots
And "status" key in DCS has postgres0 in slots
And "status" key in DCS has postgres1 in slots
And "status" key in DCS does not have postgres2 in slots
And "status" key in DCS has postgres2 in slots
And "status" key in DCS has postgres3 in slots

@slot-advance
Scenario: check that only non-permanent member slots are written to the retain_slots in /status key
And "status" key in DCS has postgres0 in retain_slots
And "status" key in DCS has postgres1 in retain_slots
And "status" key in DCS has postgres2 in retain_slots
And "status" key in DCS does not have postgres3 in retain_slots

Scenario: check permanent physical replication slot after failover
Given I shut down postgres3
And I shut down postgres2
Expand Down
Loading

0 comments on commit 763183d

Please sign in to comment.