Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish what metrics are needed for production monitoring #18

Open
pnorman opened this issue Jan 3, 2024 · 5 comments
Open

Establish what metrics are needed for production monitoring #18

pnorman opened this issue Jan 3, 2024 · 5 comments

Comments

@pnorman
Copy link
Owner

pnorman commented Jan 3, 2024

What do we need for production monitoring, and what do we already have?

@pnorman
Copy link
Owner Author

pnorman commented Jan 3, 2024

A CDN will have its own metrics, which are obviously independent of tilekiln. We also can't really control what metrics those are.

Behind the server, we need metrics on serving, DB health, tile metrics, and replication.

Because in production tilekiln will be behind nginix or something else that handles SSL termination, we can rely on that for serving metrics like number of requests, avg request size, tile serving time, etc. This avoids duplicating existing metrics, and these metrics would be needed for any server, not just tilekiln.

There are existing PostgreSQL metrics for table size, index size, bloat, etc. Doing these per-table will get us the per-zoom sizes, because the storage tables are partitioned.

Replication metrics are covered by existing osm2pgsql metrics, and tilekiln is not specific to osm2pgsql.

Where we need metrics is for tiles.

This would cover tile size, number of tiles, and new tiles being generated. The dimensions would be host, db, schema/tileset, and zoom. These can be calculated from the DB with something like

SUM(length(geom::bytea)),
COUNT(*),
percentile_disc(ARRAY[0.00,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00]) WITHIN GROUP (ORDER BY length(tile))
FROM tiles_z14;

Notes:

  • this might be faster as a query on the entire tiles table with parallelism, but needs consideration.
  • length(tile) is different than pg_column_size
  • When we get compressed tiles, this will report compressed size. The only way to avoid this would be to decompress each tile in-DB and get its size, or add another column with tile size, adding 32 bits per row for a total of 1.4GB additional storage. This isn't unreasonable, but can be deferred for now.
  • Have to check how to do histograms with prometheus
  • This is faster with JIT on
  • It can't be calculated every 15s, so maybe cache with a mat view

For tiles being generated we need a counter. Obvious choice is a table with number of tiles generated per zoom, and then set it to +1 its current value after generating a tile.
This would add time to tile generation, so maybe do it in bunches?

@pnorman
Copy link
Owner Author

pnorman commented Jan 3, 2024

Error metrics will also be needed

@pnorman
Copy link
Owner Author

pnorman commented Jan 16, 2024

I was thinking based on https://prometheus.github.io/client_python/getting-started/three-step-demo/ I should use the prometheus client to track tile generation information and upload at the end of generation, but this would work poorly in case of error. Instead I should track it manually, or perhaps use it for timing each layer?

Thinking through this has revealed the need to store configuration information in-DB, both so I don't have to specify the config files, but also so I can support many config files for different schemas.

@pnorman
Copy link
Owner Author

pnorman commented Jan 30, 2024

Tileset storage metrics have been implemented. Additionally, there are metrics for monitoring the time to calculate the storage metrics.

Generation metrics need implementing still.

@pnorman
Copy link
Owner Author

pnorman commented Feb 7, 2024

Generation metrics deferred to after parallelism is implemented. This is because reworking the generation code to be parallel might change things around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant