Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

Merged
merged 2 commits into from
Mar 20, 2023
Merged

Switch nativeaot7.0 microbenchmarks to 8.0 #2919

merged 2 commits into from
Mar 20, 2023

Conversation

cincuranet
Copy link
Contributor

This should eventually unblock #2905.

@cincuranet cincuranet self-assigned this Feb 28, 2023
@cincuranet cincuranet enabled auto-merge (squash) February 28, 2023 19:58
Copy link
Member

@LoopedBard3 LoopedBard3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given tests pass.

@cincuranet
Copy link
Contributor Author

cincuranet commented Feb 28, 2023

@MichalStrehovsky can you advise on the errors in NativeAOT micro benchmarks on net8.0? The logs are here, here and here.

This

terminate called after throwing an instance of 'std::bad_alloc'
     what():  std::bad_alloc

seems to be really not nice.

And maybe on related note. Does this need to run on CentOS 7?

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky can you advise on the errors in NativeAOT micro benchmarks on net8.0? The logs are here, here and here.

These are all:

The command "ilc" exited with code 137.

Exit code 137 is very likely the OOM killer. How much memory do the containers running this have?

What is the effective change this PR is doing? Is this switching from building the tests with .NET 7 to .NET 8 daily builds?

Is there a quick way for me to run this locally? We should be using less memory, not more, but it's possible there's something pathological in the new framework libraries, for example.

@cincuranet
Copy link
Contributor Author

How much memory do the containers running this have?

These are Azure VMs. Standard_D2a_v4, aka 2 CPU, 8 GB RAM.

Is this switching from building the tests with .NET 7 to .NET 8 daily builds?

Correct.

Is there a quick way for me to run this locally? We should be using less memory, not more, but it's possible there's something pathological in the new framework libraries, for example.

python3 scripts/benchmarks_ci.py --csproj src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture x64 -f nativeaot8.0 --dotnet-versions 8.0.100-preview.3.23128.7 --cli-source-info args --cli-branch 8.0 --cli-commit-sha 4c9bc7448c243a8954f1a70a6a8961e6187d5dd3 --cli-repository https://github.com/dotnet/core-sdk --cli-source-timestamp 2023-02-28T18:40:08Z '--bdn-arguments=--anyCategories runtime libraries --iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 --strategy ColdStart --stopOnFirstError true --partition-count 5 --partition-index 0'

@MichalStrehovsky
Copy link
Member

I was hoping for a way to run the failing build command by itself.

Do I understand it correctly that this is AOT-compiling all the tests in the repo into a single assembly?

Is there a way to add extra things into the generated project that is failing the build?

I'd like to add:

<ItemGoup>
<IlcArg Include="--verbose" />
</ItemGroup>

With that we'd at least know when this is getting killed.

Pretty sure this is a OOM situation. I don't know how to repro locally,

cc @dotnet/ilc-contrib

@cincuranet
Copy link
Contributor Author

You can see the commands BDN uses starting around 2023/02/28 20:31:22 (in this log).

Not sure about the IlcArg item. Probably not easily. Regular property would work fine, because that can be easily passed from command line.

@MichalStrehovsky
Copy link
Member

You can see the commands BDN uses starting around 2023/02/28 20:31:22 (in this log).

That command doesn't help because it's for a generated project with a generated C# in it. I don't know what's in it.

Not sure about the IlcArg item. Probably not easily. Regular property would work fine, because that can be easily passed from command line.

@adamsitnik do you know if there's a way to inject extra things into the csproj that BDN generates? It looks like the project build is getting OOM-killed. I'd like to at least know when this is happening. It could be narrowed down with the above (#2919 (comment)) snippet but I don't see a way to inject it into the generated CSPROJ. I have a suspicion where this could be happening (object writing) and would love to be able to confirm because it affects product direction - we had seen OOM during object writing in the past.

Right now I think we'll need to run this on a machine with more RAM. The project seems to be rather large.

@adamsitnik
Copy link
Member

No, but it's possible to pass any MsBuild property as an argument to dotnet restore/build/publish executed by the BDN.

job = job.WithArguments(new Argument[]
{
    new MsBuildArgument("/p:DebugType=portable") // See https://github.com/dotnet/roslyn/issues/42393
});

It can be done here:

job = Job.Default
.WithWarmupCount(1) // 1 warmup is enough for our purpose

@MichalStrehovsky
Copy link
Member

Thanks! Opened dotnet/runtime#83024 to expose this as a property. I'm not aware of a way to pass Items like this.

@cincuranet
Copy link
Contributor Author

I see the IlcVerboseLogging merged so I've added it in 747f36d. I'll re-trigger this build later when the change in runtime is in daily.

@cincuranet
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

/azp run

Looks like this is still 8.0.0-preview.2.23127.4 (i.e. February 27th). The runtime -> SDK -> installer flow has been pretty slow lately.

@cincuranet
Copy link
Contributor Author

Yeah. All that's left to do is wait.

@MichalStrehovsky
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

It does look like this is running out of memory during object writing. The log indicates we sent all the data to the LLVM side and the process gets OOM-killed in somewhere around EmitDebugModuleInfo or FinishObjWriter. Cc @markples

[2023/03/08 22:50:05][INFO]   ILC: Writing 1201695 object nodes...
[2023/03/08 22:50:05][INFO]   ILC: 10%...
[2023/03/08 22:50:05][INFO]   ILC: 20%...
[2023/03/08 22:50:05][INFO]   ILC: 30%...
[2023/03/08 22:50:05][INFO]   ILC: 40%...
[2023/03/08 22:50:05][INFO]   ILC: 50%...
[2023/03/08 22:50:05][INFO]   ILC: 60%...
[2023/03/08 22:50:05][INFO]   ILC: 70%...
[2023/03/08 22:50:05][INFO]   ILC: 80%...
[2023/03/08 22:50:05][INFO]   ILC: 90%...
[2023/03/08 22:50:05][INFO]   ILC: 100%...
[2023/03/08 22:50:05][INFO]   ILC: Finalizing output to '/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/d0166487-eca5-4936-bea6-510815f67765.o'...
[2023/03/08 22:50:05][INFO]   /home/helixbot/work/ADBD09A3/t/MSBuildTemphelixbot/tmp842cf9d03fec46c4bab7d621f32b195f.exec.cmd: line 2:  9592 Killed                  "/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23158.3/tools/ilc" @"/home/helixbot/work/ADBD09A3/w/9F6408C1/e/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/d0166487-eca5-4936-bea6-510815f67765.ilc.rsp"

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Mar 9, 2023

Merged dotnet/runtime#83181 that might help. We'll need to wait until that propagates to the sdk+installer repo.

@MichalStrehovsky
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

Merged dotnet/runtime#83249 that might help.

@MichalStrehovsky
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@agocke
Copy link
Member

agocke commented Mar 16, 2023

Some interesting info here: I managed to get a repro that runs on both Linux and Windows, and on Linux it consumes a lot more memory.

@agocke
Copy link
Member

agocke commented Mar 16, 2023

OK, it looks like the bulk of the allocations are in the object writer, as Michal suspected. I have some basic stack trace information, and one thing that jumps out are these lines:

Line # Instance Process Stack Stack ID Ref Count Size (B) Count
33         objwriter.dll!DwarfMemberFunctionIdTypeInfo::DumpStrings     124,780,543 4
34     objwriter.dll!llvm::SmallVectorImpl::append<char const *,void>     112,197,631 1

That looks to be this code here:

void DwarfMemberFunctionIdTypeInfo::DumpStrings(MCObjectStreamer *Streamer) {
  Streamer->emitBytes(StringRef(Name));
  Streamer->emitIntValue(0, 1);

  MCContext &context = Streamer->getContext();
  LinkageNameSymbol = context.createTempSymbol();
  Streamer->emitLabel(LinkageNameSymbol);
  Streamer->emitBytes(StringRef(LinkageName));
  Streamer->emitIntValue(0, 1);
}

My understanding of this code is that we're pulling the current section and emitting the function name and its mangled name into it.

But it's notable that I think pretty much all the function names are in the same section. So we're basically pushing 120 MB of allocations through LLVM's "SmallVectorImpl". I don't know how that's implemented, but it raises a red flag for me that we might not be using this API correctly.

@agocke
Copy link
Member

agocke commented Mar 16, 2023

don't know how that's implemented, but it raises a red flag for me that we might not be using this API correctly.

Hmm, the more I dig, the more I think the API we're not using correctly is LLVM itself, namely the assembly-level API. I'm not sure this API was supposed to be used for the scale we're throwing at it.

@agocke
Copy link
Member

agocke commented Mar 16, 2023

Ah, you know what, I no longer think 8 GB of memory is unreasonable for this compilation.

The output image is 448 MB.

$ du -hs /home/andy/code/performance/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/764ca92
b-1570-40ea-89f5-a9f67194f16a.o
448M    /home/andy/code/performance/artifacts/obj/BenchmarkDotNet.Autogenerated/Release/net8.0/linux-x64/native/764ca92b-1570-40ea-89f5-a9f67194f16a.o

I need to look at the inputs here before trying to optimize ILC.

@MichalStrehovsky
Copy link
Member

Agreed, 448 MB object file is a pretty large application. After linking/stripping, it will likely be more in the 100-200 MB range, but it's still pretty large. IIRC Benchmark.NET forces partial trimming because it's not compatible with full trimming. It likely contributes to the problem. We had problems with this in the past (dotnet/BenchmarkDotNet#2046).

That said, I retriggered the CI leg after dotnet/runtime#83447 and looks like now it succeeded. The Blazor leg is still failing, but looks unrelated.

🍾

@agocke
Copy link
Member

agocke commented Mar 17, 2023

It looks like the microbenchmarks are already getting trimmed with TrimMode=full. So it's probably just that the microbenchmarks are pulling in basically the whole framework.

No easy fix for that. The test machine might just eventually need > 8 GB of RAM. I don't think it's unreasonable to take 16 GB of RAM to compile a 500 MB exe. Either that, or split these tests into two assemblies.

@jkotas
Copy link
Member

jkotas commented Mar 17, 2023

split these tests into two assemblies.

Splitting the microbenchmarks into multiple assemblies is the only option that will work long term. One huge binary with all microbenchmarks will keep running into limits. It is not possible to have >2GB binary on Windows due to file format and OS limitations. Once we hit the 2GB limit, no amount of extra RAM on the build machine is going to help.

@adamsitnik
Copy link
Member

Splitting the microbenchmarks into multiple assemblies is the only option that will work long term.

In a sense, we are already doing that. The CI by default uses PartitionCount = 5, which means that it splits all the microbenchmarks and it runs them on 5 machines, each of them is building a project that contains 1/5 of all the benchmarks.

<PropertyGroup>
<PartitionCount>5</PartitionCount>
</PropertyGroup>
<ItemGroup>
<Partition Include="Partition0" Index="0" />
<Partition Include="Partition1" Index="1" />
<Partition Include="Partition2" Index="2" />
<Partition Include="Partition3" Index="3" />
<Partition Include="Partition4" Index="4" />
</ItemGroup>

This is configurable via command line:

--partition-count $PartitionCount --partition-index $PartitionCount

@cincuranet cincuranet disabled auto-merge March 20, 2023 09:07
@cincuranet cincuranet merged commit 245e7df into dotnet:main Mar 20, 2023
@cincuranet cincuranet deleted the nativeaot7-8 branch March 20, 2023 09:07
cincuranet added a commit that referenced this pull request Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants