-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory consumption #181
Comments
I can reproduce the descriped behaviour with my installation. I am using docker project for setup of the tileserver and observed high RAM memory consumption of the container. A closer look into container shows that memory consumtion of renderd service seems to grow up over the time until oom killer stops the service. Depending from the load it will happen after hours or weeks on my system. Have found a solution / workaround for that? note: There is another user who reportet this issue. |
@stevo01 Can you describe in a bit more detail what you are doing? Can you reproduce the problem in a smaller system? I don't have anywhere I could import Europe in and allocate 48Gb to, but if you could reproduce the problem in a smaller container (with less data and less memory) other people might be able to investigate. |
Hi,
I wrote description how to setup the docker based tile server with import of osm extract (just germany). This allows you to setup the server in some minutes if docker is available. The initial import of germany needs around 3 hours on my workstation (amd ryzen 1700). I triggert the renderer with command and analysed memory usage of renderd with command: The memory consumtion after renderd startup is 2089460K. See attachments for details. |
Having the same issue |
This is my render command: This is the output of pmap 2259
|
We also encounter this problem. It seems we could mitigate this by adding memory limits in systemd's service definition:
While the level of memory approaches, systemd seems to get memory back from render daemon. renderd does not crash, and we do not have oom killer issues. But when rendering expired tiles, it may still happens that the rendering is stopped early because systemd has destroyed some memory. |
Adding a memory limit will not reduce the memory consumption. What might happen is that the increased memory pressure on this cgroup is leading to pages being dropped increasing the available memory. Either file-backed pages are dropped or anonymous memory relocated to the swap device. You could try trading CPU for memory by using zram as a target for swapping. Hopefully most of the allocated memory is not part of the active working set, so the extra cost for swapping to RAM could be tolerabe. |
Thanks @stephankn . I am going to investigate this. |
Since either restarting postgresql or restarting renderd releases huge amount of memory, there must be an issue with the connection of both processes. I found an interesting post regarding long running connections:
The usual solutions seems is to set a maximum life time for a single connection. Is there such an option in renderd? |
It is unlikely this is the problem, because most rendering databases contain a small number of large objects, not many small objects. |
In my case there are four tables with an index for each of them, like osm2pgsql creates. A geometry is indexed using postgis plugin. I did not dig into detail of postgis or postgresql caching mechanisms but I made a simple test using logged queries from tile rendering. I wrote a php script issuing this queries with the option of closing and reopening the connection in between. During querying the database a postgresql process increased memory consumption. The memory consumtion of the php script did not change at all. After closing the connection the postgresql process vanished and all memory was freed. I did not see any memory increase on client side which is in contradiction to the comments here and to what I experienced after starting the rendering session. But after some time of rendering, where low zoom levels with large query result sets are completely rendered, there seems to be low memory consumption increase on client (renderd) side but on server side. For the record some rough data:
|
In my case restarting only postgres doesn't free up memory. Renderd still holds a lot of RAM (the more I render the more it takes with no apparent limit visible). Did anyone have any progress on this issue? |
Maybe if you reduced the -n 64 to a (much?) smaller number? 64 renders
in parallel seems like a large number to me, even with 180GB RAM.
Lynn (D) - Running my own planet-wide tile server on very modest hardware
…On 1/29/2021 1:28 AM, suneet-nokia via Tile-serving wrote:
I'm facing similar issue while rendering tiles for zoom-level(Z)>14
|render_list -n 64 -s /var/run/renderd/renderd.sock -z 15 -Z 15 -m ajt -a|
I have allocated 180GB RAM for PostGIS but all of it gets consumed and
the db server restarts.
image
<https://user-images.githubusercontent.com/71066412/106238861-9fd95880-6227-11eb-9248-76474d5d3a82.png>
I went inside the db shell and checked all the processes, not sure if
this is normal :/
image
<https://user-images.githubusercontent.com/71066412/106239642-f85d2580-6228-11eb-8905-e3490f13bef9.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#181 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6353SVJDIFX5JT3JFJ2GLS4JIP5ANCNFSM4FDRE4HQ>.
_______________________________________________
Tile-serving mailing list
***@***.***
https://lists.openstreetmap.org/listinfo/tile-serving
|
Thanks Lynn for responding. |
Hello. I'm having issues also. In my case renderd is eating all it cans until it crashes. It's now running with following command: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND It's consuming already 22G RAM. I've 1G left only, sometimes goes to less. Would be good to have a way to limit the amount of memory it tries to use. Let's see if I can manage to have this running with only 2 threads. |
It crashed again. After that I set renderd.conf to 4 threads (was 8) and kept the same 2 threads in render_list command. So it seems that it needs about 3.2G ram per thread. It's rendering now for almost 24h, going on zoom 11 already. |
I think in this issue two different issues are mixed, both around memory not freed. The initial report was about the RSS of renderd growing. I have with my recent setup (using tirex) a similar problem like described by @duerk-de. In that case the RssAnon of the postgres backends is growing. A single backend grow with a rendering queue fully busy during this time leads to a growth of roughly 3GB within 7 hours for each rendering instance: RssAnon: from 352.532 to 3.343.852, increase by 2.991.320 resetting the database connection frees this memory. I also have the suspicion that this is related to mapnik using persistent connections to postgis. By looking at the source code I think this is used in the same way in renderd and tirex. A workaround with tirex is to send SIGHUP to the rendering backend manager. I would feel better if the root cause is understood. Postgresql developer documentation suggests dumping the allocation structure with gdb, but I fear I won't understand the output without deeply digging into Postgresql.
|
I figured out the source for my memory leak in PostgreSQL. It is a problem with JIT leaking memory. Upstream in Bug 16707: So if you have the symptom, that RssAnon of your postgres backend is continuously increasing check whether tuning off JIT helps. use ps to figure out one or mire PIDs of the backends in use by user osm:
Then check out RssAnon memory allocation:
This value should stay relatively constant during rendering activity. Mine did increase with a rate of roughly 250MB/hour. If affected, check whether you have JIT enabled.
If this shows on, you can likely fix the leak by turning it off. Either in the configuration file or by alter system.
According to upstream reports, the leak exists since PostgreSQL 12. So at least the leak within PostgreSQL mentioned in this bug report is addressed. The potential leak inside renderd might still be there. |
I'd add that with most stylesheets you may want to turn JIT off anyways. |
I know the thread is a few months old, but thought I'd add my findings: I'm using the Before reading this thread (and similar threads, like openstreetmap-tile-server/issues/27) I was convinced renderd was the issue. However, after doing some further digging and reading @stephankn post I concluded that actually what I'm seeing is the same as him - the value for RssAnon always climbed along with the RAM usage. Following reading up on it, I disabled the JIT feature as described. This has definitely resolved the issue - previously I could exhaust my RAM is <48 hours, whereas now it ticks over with an almost identical memory usage and RssAnon value 5 days later. Genuinely, many thanks for your write-up stephankn. |
The core issue seems to be the memory management combined with the memory usage pattern of renderd. The glib-provided heap implementation somehow fails to give back memory to the system, but instead keeps on growing the data segment until the system decides to kill the process. The jemalloc implementation puts allocations into arenas and once they are freed (and some time has passed), they are given back to the system and seem to reduce the RSS size of the process. This should address/alleviate openstreetmap#181
I just thought I'd mention my findings on this issue:
Background: my setup has Recompiling with jemalloc yields the following: 0.5GB - 4.7GB per process size at zoom levels 0..11. Right now they are hovering at 1.7GB and churning "ocean tiles" (so little to no features to render)... My hypothesis is as follows: To render a "busy" tile (one that has a lot of detail at a medium/high zoom level), Fragmentation is still an issue (and will always be with a non-garbage collecting system/language), but switching to this new allocator has shown a much nicer memory consumption pattern. PS: My leak detection findings are very rudimentary and I don't claim that I found all leaks. But to further investigate, |
@rolandbosa, great work! Thank you very much for working on this!
Yes, that was me, I did try and clean everything up and did also check for memory leaks, but I added that modification later and must have missed it. I will take a look at your commit and test it out, feel free to open a pull request so that others may also do so.
Good idea, it might be good to also try out https://github.com/google/tcmalloc, which is what openstreetmap.org's tile servers have been using for a while now (since this commit). |
We should document that JIT should be avoided on tileservers because of memory issues (and more general reasons) and close this issue and open a new issue for any new memory issues. |
The patches in the linked bug were applied and the issue is fixed in PG 17 and backported to all branches. I still recommend turning off JIT for the slowness issues with most stylesheets that happen with JIT in some conditions, but that's nothing to do with mod_tile and should be documented by the stylesheets. |
Thanks @pnorman, I will soon be closing this issue unless there are any objections. Then, I will open up two new issues to track any memory leaks (including the aforementioned one) and JIT-related documentation updates. |
@hummeltech I'll pull out the commit for the Regarding the memory consumption in general: I've been experimenting with I've committed a few traces and hope to visualize them to get a better idea of what's going on. |
Here's a quick and dirty visualization of the respective memory footprints. All tests are done consecutively, but plotted on the same (relative) time axis. Conclusions so far:
Timeline: |
The core issue seems to be the memory management combined with the memory usage pattern of renderd. The glib-provided heap implementation somehow fails to give back memory to the system, but instead keeps on growing the data segment until the system decides to kill the process. The jemalloc implementation puts allocations into arenas and once they are freed (and some time has passed), they are given back to the system and seem to reduce the RSS size of the process. This should address/alleviate #181
The core issue seems to be the memory management combined with the memory usage pattern of renderd. The glib-provided heap implementation somehow fails to give back memory to the system, but instead keeps on growing the data segment until the system decides to kill the process. The jemalloc implementation puts allocations into arenas and once they are freed (and some time has passed), they are given back to the system and seem to reduce the RSS size of the process. This should address/alleviate #181
The core issue seems to be the memory management combined with the memory usage pattern of renderd. The glib-provided heap implementation somehow fails to give back memory to the system, but instead keeps on growing the data segment until the system decides to kill the process. The jemalloc implementation puts allocations into arenas and once they are freed (and some time has passed), they are given back to the system and seem to reduce the RSS size of the process. This should address/alleviate openstreetmap#181
Closing this issue in favor of new replacement issue #446 for adding documentation recommending the disabling of JIT (even though the previously existing memory leak in PostgreSQL should have been resolved). Another issue #445 has also been created in order to track potential memory leaks in this project. |
The core issue seems to be the memory management combined with the memory usage pattern of renderd. The glib-provided heap implementation somehow fails to give back memory to the system, but instead keeps on growing the data segment until the system decides to kill the process. The jemalloc implementation puts allocations into arenas and once they are freed (and some time has passed), they are given back to the system and seem to reduce the RSS size of the process. This should address/alleviate openstreetmap#181
Have done a whole Europe import and am now trying to prerender the tiles using reder_list.
This works fine until ZoomLevel 10.
I have 6 threads running and renderd consumes about 10-15GB.
around tile 528 memory consumption jumps to more than 48GB which causes oom_killer (renderd runs in a proxmox CT) to kill renderd.
Jun 5 08:38:17 renderd renderd[1882]: Rendering projected coordinates 10 528 368 -> 626172.135713|5322463.153556 939258.203569|5635549.221413 to a 8 x 8 tile
Jun 5 08:40:29 renderd systemd[1]: renderd.service: Main process exited, code=killed, status=9/KILL
Is this a mod_tile/renderd or mapnik issue. Can I do something about this ?
thx
The text was updated successfully, but these errors were encountered: