Switch nativeaot7.0 microbenchmarks to 8.0 #2919

cincuranet · 2023-02-28T19:56:35Z

This should eventually unblock #2905.

LoopedBard3

LGTM, given tests pass.

cincuranet · 2023-02-28T21:19:01Z

@MichalStrehovsky can you advise on the errors in NativeAOT micro benchmarks on net8.0? The logs are here, here and here.

This

terminate called after throwing an instance of 'std::bad_alloc'
     what():  std::bad_alloc

seems to be really not nice.

And maybe on related note. Does this need to run on CentOS 7?

MichalStrehovsky · 2023-03-02T02:32:15Z

@MichalStrehovsky can you advise on the errors in NativeAOT micro benchmarks on net8.0? The logs are here, here and here.

These are all:

The command "ilc" exited with code 137.

Exit code 137 is very likely the OOM killer. How much memory do the containers running this have?

What is the effective change this PR is doing? Is this switching from building the tests with .NET 7 to .NET 8 daily builds?

Is there a quick way for me to run this locally? We should be using less memory, not more, but it's possible there's something pathological in the new framework libraries, for example.

cincuranet · 2023-03-02T09:42:13Z

How much memory do the containers running this have?

These are Azure VMs. Standard_D2a_v4, aka 2 CPU, 8 GB RAM.

Is this switching from building the tests with .NET 7 to .NET 8 daily builds?

Correct.

Is there a quick way for me to run this locally? We should be using less memory, not more, but it's possible there's something pathological in the new framework libraries, for example.

python3 scripts/benchmarks_ci.py --csproj src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture x64 -f nativeaot8.0 --dotnet-versions 8.0.100-preview.3.23128.7 --cli-source-info args --cli-branch 8.0 --cli-commit-sha 4c9bc7448c243a8954f1a70a6a8961e6187d5dd3 --cli-repository https://github.com/dotnet/core-sdk --cli-source-timestamp 2023-02-28T18:40:08Z '--bdn-arguments=--anyCategories runtime libraries --iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 --strategy ColdStart --stopOnFirstError true --partition-count 5 --partition-index 0'

MichalStrehovsky · 2023-03-03T08:44:20Z

I was hoping for a way to run the failing build command by itself.

Do I understand it correctly that this is AOT-compiling all the tests in the repo into a single assembly?

Is there a way to add extra things into the generated project that is failing the build?

I'd like to add:

<ItemGoup>
<IlcArg Include="--verbose" />
</ItemGroup>

With that we'd at least know when this is getting killed.

Pretty sure this is a OOM situation. I don't know how to repro locally,

cc @dotnet/ilc-contrib

cincuranet · 2023-03-03T10:38:21Z

You can see the commands BDN uses starting around 2023/02/28 20:31:22 (in this log).

Not sure about the IlcArg item. Probably not easily. Regular property would work fine, because that can be easily passed from command line.

MichalStrehovsky · 2023-03-06T00:48:42Z

You can see the commands BDN uses starting around 2023/02/28 20:31:22 (in this log).

That command doesn't help because it's for a generated project with a generated C# in it. I don't know what's in it.

Not sure about the IlcArg item. Probably not easily. Regular property would work fine, because that can be easily passed from command line.

@adamsitnik do you know if there's a way to inject extra things into the csproj that BDN generates? It looks like the project build is getting OOM-killed. I'd like to at least know when this is happening. It could be narrowed down with the above (#2919 (comment)) snippet but I don't see a way to inject it into the generated CSPROJ. I have a suspicion where this could be happening (object writing) and would love to be able to confirm because it affects product direction - we had seen OOM during object writing in the past.

Right now I think we'll need to run this on a machine with more RAM. The project seems to be rather large.

adamsitnik · 2023-03-06T08:32:12Z

No, but it's possible to pass any MsBuild property as an argument to dotnet restore/build/publish executed by the BDN.

job = job.WithArguments(new Argument[]
{
    new MsBuildArgument("/p:DebugType=portable") // See https://github.com/dotnet/roslyn/issues/42393
});

It can be done here:

performance/src/harness/BenchmarkDotNet.Extensions/RecommendedConfig.cs

Lines 33 to 34 in 77261e4

    
           job = Job.Default 
        
               .WithWarmupCount(1) // 1 warmup is enough for our purpose

MichalStrehovsky · 2023-03-06T08:38:21Z

Thanks! Opened dotnet/runtime#83024 to expose this as a property. I'm not aware of a way to pass Items like this.

cincuranet · 2023-03-06T13:02:09Z

I see the IlcVerboseLogging merged so I've added it in 747f36d. I'll re-trigger this build later when the change in runtime is in daily.

cincuranet · 2023-03-07T08:18:48Z

/azp run

azure-pipelines · 2023-03-07T08:19:00Z

Azure Pipelines successfully started running 1 pipeline(s).

MichalStrehovsky · 2023-03-07T09:06:34Z

/azp run

Looks like this is still 8.0.0-preview.2.23127.4 (i.e. February 27th). The runtime -> SDK -> installer flow has been pretty slow lately.

cincuranet · 2023-03-07T09:12:01Z

Yeah. All that's left to do is wait.

MichalStrehovsky · 2023-03-08T22:34:28Z

/azp run

azure-pipelines · 2023-03-08T22:34:40Z

Azure Pipelines successfully started running 1 pipeline(s).

MichalStrehovsky · 2023-03-09T02:42:59Z

It does look like this is running out of memory during object writing. The log indicates we sent all the data to the LLVM side and the process gets OOM-killed in somewhere around EmitDebugModuleInfo or FinishObjWriter. Cc @markples

[2023/03/08 22:50:05][INFO]   ILC: Writing 1201695 object nodes...
[2023/03/08 22:50:05][INFO]   ILC: 10%...
[2023/03/08 22:50:05][INFO]   ILC: 20%...
[2023/03/08 22:50:05][INFO]   ILC: 30%...
[2023/03/08 22:50:05][INFO]   ILC: 40%...
[2023/03/08 22:50:05][INFO]   ILC: 50%...
[2023/03/08 22:50:05][INFO]   ILC: 60%...
[2023/03/08 22:50:05][INFO]   ILC: 70%...
[2023/03/08 22:50:05][INFO]   ILC: 80%...
[2023/03/08 22:50:05][INFO]   ILC: 90%...
[2023/03/08 22:50:05][INFO]   ILC: 100%...
[2023/03/08 22:50:05][INFO]   ILC: Finalizing output to '/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/d0166487-eca5-4936-bea6-510815f67765.o'...
[2023/03/08 22:50:05][INFO]   /home/helixbot/work/ADBD09A3/t/MSBuildTemphelixbot/tmp842cf9d03fec46c4bab7d621f32b195f.exec.cmd: line 2:  9592 Killed                  "/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23158.3/tools/ilc" @"/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/d0166487-eca5-4936-bea6-510815f67765.ilc.rsp"

MichalStrehovsky · 2023-03-09T06:13:49Z

Merged dotnet/runtime#83181 that might help. We'll need to wait until that propagates to the sdk+installer repo.

MichalStrehovsky · 2023-03-09T23:18:48Z

/azp run

azure-pipelines · 2023-03-09T23:19:00Z

Azure Pipelines successfully started running 1 pipeline(s).

MichalStrehovsky · 2023-03-10T05:43:38Z

Merged dotnet/runtime#83249 that might help.

MichalStrehovsky · 2023-03-11T06:25:46Z

/azp run

azure-pipelines · 2023-03-11T06:25:58Z

Azure Pipelines successfully started running 1 pipeline(s).

MichalStrehovsky · 2023-03-15T02:50:53Z

/azp run

azure-pipelines · 2023-03-15T02:51:04Z

Azure Pipelines successfully started running 1 pipeline(s).

agocke · 2023-03-16T00:25:56Z

Some interesting info here: I managed to get a repro that runs on both Linux and Windows, and on Linux it consumes a lot more memory.

agocke · 2023-03-16T23:40:18Z

OK, it looks like the bulk of the allocations are in the object writer, as Michal suspected. I have some basic stack trace information, and one thing that jumps out are these lines:

Line #	Instance	Process	Stack	Stack ID	Ref Count	Size (B)	Count
33			objwriter.dll!DwarfMemberFunctionIdTypeInfo::DumpStrings			124,780,543	4
34			objwriter.dll!llvm::SmallVectorImpl::append<char const *,void>			112,197,631	1

That looks to be this code here:

void DwarfMemberFunctionIdTypeInfo::DumpStrings(MCObjectStreamer *Streamer) {
  Streamer->emitBytes(StringRef(Name));
  Streamer->emitIntValue(0, 1);

  MCContext &context = Streamer->getContext();
  LinkageNameSymbol = context.createTempSymbol();
  Streamer->emitLabel(LinkageNameSymbol);
  Streamer->emitBytes(StringRef(LinkageName));
  Streamer->emitIntValue(0, 1);
}

My understanding of this code is that we're pulling the current section and emitting the function name and its mangled name into it.

But it's notable that I think pretty much all the function names are in the same section. So we're basically pushing 120 MB of allocations through LLVM's "SmallVectorImpl". I don't know how that's implemented, but it raises a red flag for me that we might not be using this API correctly.

agocke · 2023-03-16T23:46:37Z

don't know how that's implemented, but it raises a red flag for me that we might not be using this API correctly.

Hmm, the more I dig, the more I think the API we're not using correctly is LLVM itself, namely the assembly-level API. I'm not sure this API was supposed to be used for the scale we're throwing at it.

agocke · 2023-03-16T23:51:52Z

Ah, you know what, I no longer think 8 GB of memory is unreasonable for this compilation.

The output image is 448 MB.

$ du -hs /home/andy/code/performance/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/764ca92
b-1570-40ea-89f5-a9f67194f16a.o
448M    /home/andy/code/performance/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/764ca92b-1570-40ea-89f5-a9f67194f16a.o

I need to look at the inputs here before trying to optimize ILC.

MichalStrehovsky · 2023-03-17T02:29:53Z

Agreed, 448 MB object file is a pretty large application. After linking/stripping, it will likely be more in the 100-200 MB range, but it's still pretty large. IIRC Benchmark.NET forces partial trimming because it's not compatible with full trimming. It likely contributes to the problem. We had problems with this in the past (dotnet/BenchmarkDotNet#2046).

That said, I retriggered the CI leg after dotnet/runtime#83447 and looks like now it succeeded. The Blazor leg is still failing, but looks unrelated.

🍾

agocke · 2023-03-17T18:59:57Z

It looks like the microbenchmarks are already getting trimmed with TrimMode=full. So it's probably just that the microbenchmarks are pulling in basically the whole framework.

No easy fix for that. The test machine might just eventually need > 8 GB of RAM. I don't think it's unreasonable to take 16 GB of RAM to compile a 500 MB exe. Either that, or split these tests into two assemblies.

jkotas · 2023-03-17T19:05:11Z

split these tests into two assemblies.

Splitting the microbenchmarks into multiple assemblies is the only option that will work long term. One huge binary with all microbenchmarks will keep running into limits. It is not possible to have >2GB binary on Windows due to file format and OS limitations. Once we hit the 2GB limit, no amount of extra RAM on the build machine is going to help.

adamsitnik · 2023-03-18T08:06:03Z

Splitting the microbenchmarks into multiple assemblies is the only option that will work long term.

In a sense, we are already doing that. The CI by default uses PartitionCount = 5, which means that it splits all the microbenchmarks and it runs them on 5 machines, each of them is building a project that contains 1/5 of all the benchmarks.

performance/eng/performance/helix.proj

Lines 25 to 34 in 7eddd9d

    
           <PropertyGroup> 
        
             <PartitionCount>5</PartitionCount> 
        
           </PropertyGroup> 
        
           <ItemGroup> 
        
             <Partition Include="Partition0" Index="0" /> 
        
             <Partition Include="Partition1" Index="1" /> 
        
             <Partition Include="Partition2" Index="2" /> 
        
             <Partition Include="Partition3" Index="3" /> 
        
             <Partition Include="Partition4" Index="4" /> 
        
           </ItemGroup>

This is configurable via command line:

--partition-count $PartitionCount --partition-index $PartitionCount

cincuranet requested review from sblom, LoopedBard3, DrewScoggins and caaavik-msft February 28, 2023 19:56

cincuranet self-assigned this Feb 28, 2023

cincuranet enabled auto-merge (squash) February 28, 2023 19:58

LoopedBard3 approved these changes Feb 28, 2023

View reviewed changes

Switch nativeaot7.0 to 8.0

ff78a85

MichalStrehovsky mentioned this pull request Mar 6, 2023

Expose verbose logging in ILC as a property dotnet/runtime#83024

Merged

Add IlcVerboseLogging

747f36d

This was referenced Mar 9, 2023

Compilation getting OOM killed during object writing dotnet/runtime#83180

Closed

Try reducing memory pressure during object writing dotnet/runtime#83181

Merged

cincuranet disabled auto-merge March 20, 2023 09:07

cincuranet merged commit 245e7df into dotnet:main Mar 20, 2023

cincuranet deleted the nativeaot7-8 branch March 20, 2023 09:07

cincuranet added a commit that referenced this pull request Mar 20, 2023

Remove ilc verbose logging from #2919. (#2934)

ddaea22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

cincuranet commented Feb 28, 2023

LoopedBard3 left a comment

cincuranet commented Feb 28, 2023 •

edited

Loading

MichalStrehovsky commented Mar 2, 2023

cincuranet commented Mar 2, 2023

MichalStrehovsky commented Mar 3, 2023

cincuranet commented Mar 3, 2023

MichalStrehovsky commented Mar 6, 2023

adamsitnik commented Mar 6, 2023

MichalStrehovsky commented Mar 6, 2023

cincuranet commented Mar 6, 2023

cincuranet commented Mar 7, 2023

azure-pipelines bot commented Mar 7, 2023

MichalStrehovsky commented Mar 7, 2023

cincuranet commented Mar 7, 2023

MichalStrehovsky commented Mar 8, 2023

azure-pipelines bot commented Mar 8, 2023

MichalStrehovsky commented Mar 9, 2023

MichalStrehovsky commented Mar 9, 2023 •

edited

Loading

MichalStrehovsky commented Mar 9, 2023

azure-pipelines bot commented Mar 9, 2023

MichalStrehovsky commented Mar 10, 2023

MichalStrehovsky commented Mar 11, 2023

azure-pipelines bot commented Mar 11, 2023

MichalStrehovsky commented Mar 15, 2023

azure-pipelines bot commented Mar 15, 2023

agocke commented Mar 16, 2023

agocke commented Mar 16, 2023 •

edited

Loading

agocke commented Mar 16, 2023

agocke commented Mar 16, 2023 •

edited

Loading

MichalStrehovsky commented Mar 17, 2023

agocke commented Mar 17, 2023 •

edited

Loading

jkotas commented Mar 17, 2023

adamsitnik commented Mar 18, 2023

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

Conversation

cincuranet commented Feb 28, 2023

LoopedBard3 left a comment

Choose a reason for hiding this comment

cincuranet commented Feb 28, 2023 • edited Loading

MichalStrehovsky commented Mar 2, 2023

cincuranet commented Mar 2, 2023

MichalStrehovsky commented Mar 3, 2023

cincuranet commented Mar 3, 2023

MichalStrehovsky commented Mar 6, 2023

adamsitnik commented Mar 6, 2023

MichalStrehovsky commented Mar 6, 2023

cincuranet commented Mar 6, 2023

cincuranet commented Mar 7, 2023

azure-pipelines bot commented Mar 7, 2023

MichalStrehovsky commented Mar 7, 2023

cincuranet commented Mar 7, 2023

MichalStrehovsky commented Mar 8, 2023

azure-pipelines bot commented Mar 8, 2023

MichalStrehovsky commented Mar 9, 2023

MichalStrehovsky commented Mar 9, 2023 • edited Loading

MichalStrehovsky commented Mar 9, 2023

azure-pipelines bot commented Mar 9, 2023

MichalStrehovsky commented Mar 10, 2023

MichalStrehovsky commented Mar 11, 2023

azure-pipelines bot commented Mar 11, 2023

MichalStrehovsky commented Mar 15, 2023

azure-pipelines bot commented Mar 15, 2023

agocke commented Mar 16, 2023

agocke commented Mar 16, 2023 • edited Loading

agocke commented Mar 16, 2023

agocke commented Mar 16, 2023 • edited Loading

MichalStrehovsky commented Mar 17, 2023

agocke commented Mar 17, 2023 • edited Loading

jkotas commented Mar 17, 2023

adamsitnik commented Mar 18, 2023

cincuranet commented Feb 28, 2023 •

edited

Loading

MichalStrehovsky commented Mar 9, 2023 •

edited

Loading

agocke commented Mar 16, 2023 •

edited

Loading

agocke commented Mar 16, 2023 •

edited

Loading

agocke commented Mar 17, 2023 •

edited

Loading