Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On creating a DATAS-lite #111436

Open
jhudsoncedaron opened this issue Jan 14, 2025 · 8 comments
Open

On creating a DATAS-lite #111436

jhudsoncedaron opened this issue Jan 14, 2025 · 8 comments
Labels
area-GC-coreclr untriaged New issue has not been triaged by the area owner

Comments

@jhudsoncedaron
Copy link

This analysis was originally ran in parallel with the creation of DATAS (although we didn't know it at the time); we had thought that DATAS would just solve the problem after .NET 8 was released; however this has proven to not be the case due to #105780. We had been told to expect a fix in the first servicing release of .NET 9 ; however the fix was rolled back due to a bad interaction with WinForms.

Thus; it's time to weigh options; which is easier to do, track down and fix whatever is causing the bad interaction between DATAS, BGC and WinForms, or add more statistics methods.

The problem is, due to ebb and flow of usage of various processes on the server, the result is some processes hog all the system RAM and don't release it while other processes are scrambling to get enough. This can in the worst case cause the server to page, which thrashes RAM pretty badly when the GC finally does wake up on a big-allocation process.

The obvious solution is to run the GC based on something like heap growth; even a pretty naïve algorithm would work here better than what we are observing. In order to do this we would write a thread that looks something like this:

   void GCDrive() {
       var memory_current;
       var memory_previous;
       for(;;) {
           memory_previous = memory_current;
           memory_current = GC.???;
           if (condition) {
               GC.Collect();
               GC.WaitForPendingFinalizers();
               memory_previous = memory_current;
               memory_current = GC.???;
          }
          System.Threading.Thread.Sleep(3000);
       }
   }

The problem comes in in filling the GC.??? argument; essentially I need a GC.GetTotalAllocatedBytesImmediate() method.

So what does this do that GC.GetTotalAllocatedBytes(Boolean) doesn't; it returns the true value without waiting for GC; since we don't need synchronous readout (that is; there is no need to lock out allocators while adding up the per-thread fresh heap pieces) this is actually possible. This would return all the allocated bytes not in the fresh heap (which will not have changed since the last GC run) and the distance between the bottom and the top of the fresh heap.

Other numbers I would need to know: the immediate size of the non-fresh heap, the immediate size of the fresh heap, and the immediate size of the memory pressure (see AddMemoryPressure() and RemoveMemoryPressure() for why I would want to do this).

I would definitely have to tune condition a few times to get it right; but even a banal version still behave quite reasonably if background GC is on; at least in comparison to what we are seeing that forced us the change the big servers to workstation GC.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 14, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@Maoni0
Copy link
Member

Maoni0 commented Jan 16, 2025

So what does this do that GC.GetTotalAllocatedBytes(Boolean) doesn't; it returns the true value without waiting for GC; since we don't need synchronous readout

GC.GetTotalAllocatedBytes does not wait for GC. if you pass in false it doesn't synchronize anything - it just gets a report from whatever the current total allocated bytes happen to be. if you pass in true it does do a SuspendEE to suspend managed threads before it reads the data (then RestartEE after) but does not do a GC.

you don't have to go this primitive triggering GC route of course. there are much better options -

  1. you can invoke the clrgcexp.dll as you did when you tested the fix for the deadlock issue so you can still take advantage of DATAS but avoid the winforms issue;
  2. you can limit the # of heaps for Server GC and turn affinity off, as explained here, many teams have gone this route when they needed to run multiple Server GC processes on the same machine;
  3. if your process is pretty small (my recollection is your heap was not big at all with DATAS), you can set the HighMemPercent to be smaller, eg, instead of 90 you can set it to 30 so GC will treat 30% as a high memory load situation, many teams also have tried this one and were successful with it;

do let me know if none of these options work for you.

@jhudsoncedaron
Copy link
Author

So what's happening is we ran into trouble planning the .NET 9 upgrade. The fundamental problem is the situation changed with DATAS on by default; and therefore gets turned on on machines we don't control.

The use cases are as follows:

  1. A server with 300 processes and gobs of RAM; a process typically takes 200MB RAM but can suddenly jump to needing 3GB for 10 minutes and then go back to needing 200MB RAM. This is the killer case. The RAM is sized to typical use cases; with Server GC and no DATAS, the RAM never gets freed and so isn't available to the next process that wants 3GB.
  2. Another server with 200 processes and not very much RAM and small processes. The scenario is the same as the big one on a smaller scale. This is the one that tripped the problem.
  3. 50 servers not ours hosting two processes from set 1 and one process from set 2. This is the sticking point. Complication on 3) : we are part way through porting from Windows to Linux with the expectation that some servers will become Linux but others remain Windows (customer preference).

As for your solution 1) "clrgcexp.dll" we are nervous about shipping the fix dll for reasons I don't quite understand and we would most likely need a Linux build of it before too long. (The final shipping of the Linux product was held for .NET 9 expecting the DATAS fix first.) I suppose it would be possible to convince the rest of the team to ship it with Microsoft's blessing.

As for your solution 2) "# of heaps" I have no evidence that changing the # of heaps changes anything at all. We are not CPU Bound on GC. I don't exactly have CPU coming out my ears but I'm not that short on CPU.

As for your solution 3) "HighMemPercent" that does not seem correct; not sure what will happen when active working set is 10x HighMemPercent for 10 minutes.

About this time you should be wondering how the main server is stable. The answer is the likelihood of 10 of the 150 customers on the server pressing harvest at the same time is low and it would take around 20 to thrash the server.


On running GC.GetTotalAllocatedBytes(false) I get the following:

GC.GetTotalAllocatedBytes(false).ToString().Dump(); // 11910960
GC.GetTotalAllocatedBytes(false).ToString().Dump(); // 11919128
GC.GetTotalAllocatedBytes(false).ToString().Dump(); // 11919128
GC.GetTotalAllocatedBytes(false).ToString().Dump(); // 11919128

The problem is (false) does not mean current, it means as of last GC or something like that.

As for GC.GetTotalAllocatedBytes(true) I get the following:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.GetTotalAllocatedBytes(true).ToString().Dump(); // 14657208
GC.GetTotalAllocatedBytes(true).ToString().Dump(); // 14683144
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.GetTotalAllocatedBytes(true).ToString().Dump(); // 14686816
GC.GetTotalAllocatedBytes(true).ToString().Dump(); // 14688128

I think the counter never drops. I'm trying to gauge the current size of the heap to see if I need to do another GC not measure the allocations used. I tried getting virtual address size but that doesn't work because Windows maps several GB of stuff into the process memory that mostly isn't used; thus swamping the actual benchmark.

@Maoni0
Copy link
Member

Maoni0 commented Jan 17, 2025

this counter is the allocated bytes so it can only increase. it's not the heap size. from the doc -

Gets a count of the bytes allocated over the lifetime of the process.

if you want the heap size you should use the GetGCMemoryInfo API.

@jhudsoncedaron
Copy link
Author

Flipping through the intellisense on the properties on the return of GC.GetGCMemoryInfo(), every single one of them says "when the last garbage collection occurred"; and I'm trying to write code to sense when too much allocation charge has built up since then.

@jhudsoncedaron
Copy link
Author

jhudsoncedaron commented Jan 20, 2025

Hey I got something; the counter I'm looking for might be GC.GetTotalMemory(false) + GC.GetTotalAllocatedBytes(true) - GC.GetTotalAllocatedBytes(false); if not I can capture GC.GetTotalAllocatedBytes(true) in a finalizer.

@markples
Copy link
Member

DATAS (or changing the fixed # of heaps with DATAS disabled) impacts the amount of memory that can be allocated before collection kicks in. It is a tradeoff in the process between time and memory. Indirectly this can, of course, impact the entire system's memory load.

DATAS is aimed at processes that have busy vs idle periods. It aims to scale up the # of heaps (and therefore usage of memory) when lots of allocation is happening and scale it back down when not. (Note that this is oversimplifying things as several GC metrics are used.) If you don't think this is happening, please let us know.

However, you are also talking about system load. From the perspective of a single process, often it is good to delay a gen2, but when systemwide load increases, there is an external reason to do a gen2. HighMemPercent (which defaults to 90%) is a mechanism for this by looking at total physical memory usage - not just that of the individual process. It is ok if individual processes go up and down. The link that Maoni shared explains more. In particular, you can use a GC trace to see what memory load was measured and use that to find values that work better for your processes.

@jhudsoncedaron
Copy link
Author

@markples : The locally biggest benefit we are observing is apparently as a side effect of deciding to reduce the number of heaps when nearly idle, this triggers a GC (to drain the heaps being removed) thus freeing the built up GC charge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants