You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 26, 2024. It is now read-only.
I'm investigating runaway queries in my PostgreSQL dashboard database.
It appears my nightly prune cron is going near infinite (taking over 24 hours) in the "delete from resource_events" portion of the prune rake task.
When investigating the query, it appears that puppet-dashboard first counts the records to delete (expensive) then actually tries deleting them (expensive query again).
It appears in PostgreSQL the NOT IN isn't optimal for this query:
dashboard=# explain select count(*) from resource_events where resource_status_id not in (select id from resource_statuses);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Aggregate (cost=28055423930.51..28055423930.52 rows=1 width=0)
-> Seq Scan on resource_events (cost=0.00..28055423886.85 rows=17462 width=0)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..1538250.21 rows=27380614 width=4)
-> Seq Scan on resource_statuses (cost=0.00..1294391.14 rows=27380614 width=4)
(6 rows)
I believe an optional query would use a join (since resource_events is hopefully smaller). The below join would make this query extremely trivial and should support any ANSI SQL compatible database.
dashboard=# explain select count(*) from resource_events e LEFT JOIN resource_statuses s ON (e.resource_status_id = s.id) WHERE s.id IS NULL;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Aggregate (cost=293234.43..293234.44 rows=1 width=0)
-> Nested Loop Anti Join (cost=0.44..293234.42 rows=1 width=0)
-> Seq Scan on resource_events e (cost=0.00..1279.23 rows=34923 width=4)
-> Index Only Scan using resource_statuses_pkey on resource_statuses s (cost=0.44..8.35 rows=1 width=4)
Index Cond: (id = e.resource_status_id)
(5 rows)
The text was updated successfully, but these errors were encountered:
@sodabrew I've been trying to modify my prune job to perform this query. If it wasn't for tracking deletion_count and batching this would be extremely simple.
This is the lowest Cost way of pruning in postgres i've found...
delete from metrics WHERE NOT EXISTS (SELECT 1 FROM reports WHERE metrics.report_id = reports.id)
delete from report_logs WHERE NOT EXISTS (SELECT 1 FROM reports WHERE report_logs.report_id = reports.id)
delete from resource_statuses where not exists ( select 1 from reports where resource_statuses.report_id = reports.id );
delete from resource_events where not exists (SELECT 1 FROM resource_statuses where resource_events.resource_status_id = resource_statuses.id)
I'm investigating runaway queries in my PostgreSQL dashboard database.
It appears my nightly prune cron is going near infinite (taking over 24 hours) in the "delete from resource_events" portion of the prune rake task.
When investigating the query, it appears that puppet-dashboard first counts the records to delete (expensive) then actually tries deleting them (expensive query again).
It appears in PostgreSQL the NOT IN isn't optimal for this query:
I believe an optional query would use a join (since resource_events is hopefully smaller). The below join would make this query extremely trivial and should support any ANSI SQL compatible database.
The text was updated successfully, but these errors were encountered: