You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, we been using barman for quite a while for our PostgreSQL instances,
which some of them can contain more than 27k of databases, most of them are
small databases around ~20mb each. Such setup comprises of a lot of files
inside the data directory.
After installing barman 3.12 into one of our barman server we get a lot of
client connection from barman, running this query:
SELECT sum(pg_tablespace_size(oid)) FROM pg_tablespace
I can see since barman 3.10 receive_wal method of barman.server.Server
added a preflight logic:
..
strategy = CheckStrategy()
self._check_wal_streaming_preflight(strategy, self.get_remote_status())
if strategy.has_error:
output.error(
"Impossible to start WAL streaming. Check the log "
"for more details, or run 'barman check %s'" % self.config.name
)
return
..
Which executed before enumerating every self.archivers items to execute its receive_wal.
This incurs costly I/O operation on database server because the logic at fetch_remote_status of barman.postgres.PostgreSQLConnection which
eventually called from self.get_remote_status() would execute PostgreSQL
databases size SUM query, see
basically matching the query I found at our I/O-hobbled database server:
SELECT sum(pg_tablespace_size(oid)) FROM pg_tablespace
It would be nice if current_size property can be optional for get_remote_status result because from my cursory look into the code I cannot
find where current_size used within the context receiving WAL from database
server.
Unfortunately the problem exacerbated because this preflight checking logic
executed outside of ServerWalReceiveLock context, which would cause multiple
calls to database server.
Luckily our existing barman version in use majority are before 3.10 release
which introduce this change.
I would like to propose moving preflight check into ServerWalReceiveLock so
that only one barman client connected to database server running SELECT sum(pg_tablespace_size(oid)) FROM pg_tablespace. I made modification
to server.py at Server.receive_wal:
def receive_wal(self, reset=False):
"""
Enable the reception of WAL files using streaming protocol.
Usually started by barman cron command.
Executing this manually, the barman process will not terminate but
will continuously receive WAL files from the PostgreSQL server.
:param reset: When set, resets the status of receive-wal
"""
# Execute the receive-wal command only if streaming_archiver
# is enabled
if not self.config.streaming_archiver:
output.error(
"Unable to start receive-wal process: "
"streaming_archiver option set to 'off' in "
"barman configuration file"
)
return
try:
# Take care of the receive-wal lock.
# Only one receiving process per server is permitted
with ServerWalReceiveLock(
self.config.barman_lock_directory, self.config.name
):
# Use the default CheckStrategy to silently check WAL streaming
# conditions are met and write errors to the log file.
strategy = CheckStrategy()
self._check_wal_streaming_preflight(strategy, self.get_remote_status())
if strategy.has_error:
output.error(
"Impossible to start WAL streaming. Check the log "
"for more details, or run 'barman check %s'" % self.config.name
)
return
if not reset:
output.info("Starting receive-wal for server %s", self.config.name)
try:
# Only the StreamingWalArchiver implementation
# does something.
# WARNING: This codes assumes that there is only one
# StreamingWalArchiver in the archivers list.
for archiver in self.archivers:
archiver.receive_wal(reset)
except ArchiverFailure as e:
output.error(e)
except LockFileBusy:
# If another process is running for this server,
if reset:
output.error(
"Unable to reset the status of receive-wal "
"for server %s. Process is still running" % self.config.name
)
else:
output.error(
"Another receive-wal process is already running "
"for server %s." % self.config.name
)
This eliminate multiple clients from barman connecting to database server only
to be stuck and thrashing the I/O.
I would also propose get_remote_status (and fetch_remote_status) to be aware
of certain context so that only relevant information fetched from database
server (which eventually used by subsequent code). An example would be related
to my issue, ie. not resolving for database size when the context is to receive
WAL.
The text was updated successfully, but these errors were encountered:
Thank you very much for the thorough message @gariels !
We'll have an internal conversation on this and get back here, but I think your logic here makes a lot of sense. We should be more mindful of what data we need to gather from the server and try to cache it when possible.
Hello, we been using barman for quite a while for our PostgreSQL instances,
which some of them can contain more than 27k of databases, most of them are
small databases around ~20mb each. Such setup comprises of a lot of files
inside the data directory.
After installing barman 3.12 into one of our barman server we get a lot of
client connection from barman, running this query:
I can see since barman 3.10
receive_wal
method ofbarman.server.Server
added a preflight logic:
barman/barman/server.py
Line 2801 in c75cb74
Where commit 315bdc8 introducing it.
Which executed before enumerating every
self.archivers
items to execute itsreceive_wal
.This incurs costly I/O operation on database server because the logic at
fetch_remote_status
ofbarman.postgres.PostgreSQLConnection
whicheventually called from
self.get_remote_status()
would execute PostgreSQLdatabases size SUM query, see
barman/barman/postgres.py
Line 988 in c75cb74
barman/barman/postgres.py
Line 814 in c75cb74
basically matching the query I found at our I/O-hobbled database server:
It would be nice if
current_size
property can be optional forget_remote_status
result because from my cursory look into the code I cannotfind where
current_size
used within the context receiving WAL from databaseserver.
Unfortunately the problem exacerbated because this preflight checking logic
executed outside of
ServerWalReceiveLock
context, which would cause multiplecalls to database server.
Luckily our existing barman version in use majority are before 3.10 release
which introduce this change.
I would like to propose moving preflight check into
ServerWalReceiveLock
sothat only one barman client connected to database server running
SELECT sum(pg_tablespace_size(oid)) FROM pg_tablespace
. I made modificationto
server.py
atServer.receive_wal
:This eliminate multiple clients from barman connecting to database server only
to be stuck and thrashing the I/O.
I would also propose
get_remote_status
(andfetch_remote_status
) to be awareof certain context so that only relevant information fetched from database
server (which eventually used by subsequent code). An example would be related
to my issue, ie. not resolving for database size when the context is to receive
WAL.
The text was updated successfully, but these errors were encountered: