-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2k16 - sometimes the disk fills up, and p2k16 stops working #138
Comments
The directory /var/lib/postgresql/backups/ was filling up with db backups, causing the disk to fill. I cleaned out a few files, the disk is now better:
and then I restartet postgres via |
There is a backup service for postegres, I haven't restarted it
should this serrvice be running, or should we stop it? |
p2k16-staging also had the same problem, so I did the same there: clean out most files from db backups, then resstart postgresql. |
Perhaps it would be nice to have some monitoring on that, with email alerts to those who run the system - something like zabbix? I have a zabbix VM running… |
monitoring is in place, we miss someplace good to send the alerts. Our "IT operations group" is on a volunteer basis... |
Where can I see this monitoring status? |
monitoring is at riemann.bitraf.no |
It happened again; the disk of p2k16 filled up with postgres database backups, and the postgres service failed, causing p2k16 to fail. The disk full error was dutifully recorded by riemann.bitraf.no, but nobody looked at it. |
perhaps we should add a separate (virtual) disk drive for database backups to the server p2k16. Or better: make sure that the backups go to another server instead. Hmm. |
cleaned the backup directory on p2k16-staging, better now
then I restarted postgres with |
cleaned backup directory on p2k16-staging again
and restarted postgres |
Another cleaning of the postgres backup directory on p2k16-staging today:
plus a restart of postgres. |
p2k16 had full disk again. As usual, I cleaned out
|
We could probably extend the backup service to only retain a set number of base backups, or have a different timer retain only N basebackups.
Seems to be the code for installing the wal-e backup service, doing something similar for delete might be good enough
|
This will add service alongside the base-backup service and timer that will use wal-e to delete the oldest base-backups after making new base-backups each sunday night. The number 5 is picked a bit at random, it doesn't seem we run out of disk that often. There might be a better way to trigger this than a timer, but I am not that experienced with systemd services. Attempts to fix #138
How about we just send notifications to Slack? https://riemann.io/api/riemann.slack.html |
It has been mentioned before. This won't help the fact that few people have access to solving the issues when they happen, but at least it will alarm people earlier so that whoever can fix it, have time to do so. Just, please, make sure it won't trigger an alarm for everything. Too many false positives will ruin the whole projects. |
Better for everyone to be notified so that someone will take action , or mention it to someone that can, than it just fail silently because nobody manually checked monitoring. |
p2k16 - full disk again today. Cleaned out
|
Maintenance this evening, cleaned out
that keeps a few weeks, I think. |
Med tanke på hvor kritisk p2k16 er for hele virksomheten, er det egentlig akseptabelt med kun 6.7G ledig og at den går full med noen ukers mellomrom?
Jeg vil si nei og at vi bør investere i nødvendig utstyr for at dette ikke skal skje.
- H
… On 18 May 2021, at 18:32, Torfinn Ingolfsen ***@***.***> wrote:
Maintenance this evening, cleaned out /var/lib/postgresql/backups/ on p2k16 before it gets full again.
***@***.***:~$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 20G 12G 6.7G 65% /
that keeps a few weeks, I think.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMZGOHN4VUGO7L7OS35RFDTOKJCHANCNFSM4MREUIQA>.
|
Helt enig, men jeg føler meg ikke kompetent til å specce opp og sette opp.
Men jeg godkjenner glatt innkjøp som minsker nedetidsfare.
Thomas
tir. 18. mai 2021, 19:13 skrev Håvard Espeland ***@***.***>:
… Med tanke på hvor kritisk p2k16 er for hele virksomheten, er det egentlig
akseptabelt med kun 6.7G ledig og at den går full med noen ukers mellomrom?
Jeg vil si nei og at vi bør investere i nødvendig utstyr for at dette ikke
skal skje.
- H
On 18 May 2021, at 18:32, Torfinn Ingolfsen ***@***.***>
wrote:
Maintenance this evening, cleaned out /var/lib/postgresql/backups/ on
*p2k16* before it gets full again.
***@***.***:~$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 20G 12G 6.7G 65% /
that keeps a few weeks, I think.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#138 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMZGOHN4VUGO7L7OS35RFDTOKJCHANCNFSM4MREUIQA>
.
|
Men hva er det egentlig vi logger så aggresivt? Dette kan jo ikke dreie seg om vanlig bruk av systemet (innlogging/utsjekking). Det må være noe mer som logges for å komme opp i mange gigabyte på bare et par uker? Jeg har aldri sett på loggene, men jeg mistenker at det ikke er nødvendig med mer hardware her - heller en optimalisering av hva som logges slik at det som står i loggene er nyttig. |
Det burde jo bare være å sette opp logrotasjon, evt sende loggene til en annen server først. Eller kanskje enda bedre - logge parallelt til en annen server og så ha kort rotasjon lokalt. @jenschr jeg gjetter at det kan være webserverloggen. |
før det sporer helt av her: det som fyller opp disken er databasebackup'er - har ingenting med logger og gjøre. Såvidt meg bekjent bruker p2k16 databasen på helt vanlig måte - ikke spesielt intensivt. |
Beklager, men da burde det vel være mulig å sende den backupen til en annen server og heller bare overskrive gamle backuper lokalt? |
Selvfølgelig er det mulig - det krever dog at mennesker med rett kompetanse (og ledig tid) setter seg ned og faktisk gjør jobben. Noen av oss har forsøkt å lage en løsning for å begrense antall lokale backuper (se #144), uten at vi kom helt i mål. Jeg er definitivt ingen ekspert på postgresql, så jeg har ikke mer å bidra med der. |
Forslag: Ta en dump jevnlig til en katalog og du har ei fil, typisk pg_dump -Fc dbnavn > dbnavn.dump. -Fc er --format=custom, noe som gjør at man kan ta en restore av separate tabeller eller tilsvarende uten så mye knot. I tillegg gzipper den dataene for deg. Å kjøre en dump uten -Fc funker jo også og utgjør ikke noen forskjell i denne sammenhengen, men jeg ville bare nevne det. Denne kjøres typisk en gang i døgnet, så sett opp logrotate til å bare rotere denne (uten å komprimere mer) som om det var ei loggfil. logrotate ser jo ikke på innholdet uansett og konfigrasjonen er enkel. |
it didn't keep as long as I had hoped, today the disk was full again, so p2k16 stopped letting people open the door. Cleaned up, restarted postgresql. Disk space looks better now:
but probably doesn't hold two weeks. |
Sometimes (not very often) the disk drive of the p2k16 serve fills up. this is bad, because the then p2k16 web app stops working.
the server p2k16 runs two services related to the PostgreSQL database:
[email protected]
[email protected]
and also this
[email protected]
we also have monitoring (via riemann), but nodbody watches that on a regular basis (not sure if someone gets alarm notificatons).
The text was updated successfully, but these errors were encountered: