Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] [APX] Enable additional General Purpose Registers. #108799

Merged
merged 13 commits into from
Feb 7, 2025

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented Oct 11, 2024

What this PR does

  1. Add eGPR to available register on x64 in JIT and related changes to turn these on/off based on APX availability

Currently we are adding just 8 new registers so that total register number does not exceed 64. This is based on the conversation on this PR and following conclusion : link

  1. A LSRA_LIMIT_EXT_GPR_SET register stress mode to force eGPR register usage when possible.

  2. Some minor changes to turn on Rex2 encoding with eGPR

  3. Temporary changes to mask away eGPR for currently un-supported instructions - primarily ones requiring eEVEX + imul + movszx (This will be removed once we have support for these but are essentially while we do not have eEVEX support)

  4. Minor flags to gets altjit to work

Testing

  • Ran superpmi with/without APX enabled

With APX disabled

for TP/asmdiff : link

With APX enabled

ASMDIFF
image

Code size increases due to Rex2 but PerfScore improves. Note : This is with just a subset of x64 instructions(those requiring eEVEX will be given access to eGPR as part of upcoming changes) having access to eGPR and with just 8 eGPR enabled

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 11, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Oct 15, 2024
@DeepakRajendrakumaran DeepakRajendrakumaran marked this pull request as ready for review October 21, 2024 22:08
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Nov 4, 2024
@JulieLeeMSFT
Copy link
Member

CC @jakobbotsch and @tannergooding for code review.

@jakobbotsch
Copy link
Member

@DeepakRajendrakumaran What is the status of this PR? It's marked as ready but the description says it's built on top of #108796 that is not marked as ready.

@DeepakRajendrakumaran
Copy link
Contributor Author

@DeepakRajendrakumaran What is the status of this PR? It's marked as ready but the description says it's built on top of #108796 that is not marked as ready.

Thanks for pointing that out. It has some dependencies on other PRs - specifically the Rex2 encoding PR. Considering that, do you have a suggestion on how to mark this for now?

@DeepakRajendrakumaran
Copy link
Contributor Author

DeepakRajendrakumaran commented Nov 20, 2024

@kunalspathak

Now that CPUID changes have merged, ran superpmi TP and I have a problem

image

Ran the scripts shared by Kunal a while back to debug why this is happening

The following is for libraries

Base: 798636572986, Diff: 837269651550, +4.8374%

?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                                                            : 7483341082 : +105.48%  : 15.71% : +0.9370%
?allocateRegistersMinimal@LinearScan@@QEAAXXZ                                                                                                              : 5166096591 : +51.73%   : 10.84% : +0.6469%
?allocateRegisters@LinearScan@@QEAAXXZ                                                                                                                     : 3501980510 : +32.45%   : 7.35%  : +0.4385%
?processKills@LinearScan@@AEAAXPEAVRefPosition@@@Z                                                                                                         : 2761837171 : +53.97%   : 5.80%  : +0.3458%
?genConsumeReg@CodeGen@@IEAA?AW4_regNumber_enum@@PEAUGenTree@@@Z                                                                                           : 2114364155 : +56.59%   : 4.44%  : +0.2647%
?TakesRex2Prefix@emitter@@QEBA_NPEBUinstrDesc@1@@Z                                                                                                         : 1652787168 : NA        : 3.47%  : +0.2070%
?freeRegisters@LinearScan@@AEAAXUregMaskTP@@@Z                                                                                                             : 1645251557 : +62.83%   : 3.45%  : +0.2060%
?mergeRegisterPreferences@Interval@@QEAAX_K@Z                                                                                                              : 1424229795 : +2637.42% : 2.99%  : +0.1783%
?AddX86PrefixIfNeeded@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                                      : 1332532027 : NA        : 2.80%  : +0.1669%
?AddX86PrefixIfNeededAndNotPresent@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                         : 1247317388 : NA        : 2.62%  : +0.1562%
?gcMarkRegPtrVal@GCInfo@@QEAAXW4_regNumber_enum@@W4var_types@@@Z                                                                                           : 1236233831 : +174.95%  : 2.59%  : +0.1548%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z                                                                       : 1044477092 : +10.11%   : 2.19%  : +0.1308%
?assignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z                                                                                            : 749700826  : +42.11%   : 1.57%  : +0.0939%
?genCodeForBBlist@CodeGen@@IEAAXXZ                                                                                                                         : 707125092  : +11.03%   : 1.48%  : +0.0885%
?allocateRegMinimal@LinearScan@@AEAA?AW4_regNumber_enum@@PEAVInterval@@PEAVRefPosition@@@Z                                                                 : 704654429  : +15.88%   : 1.48%  : +0.0882%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@IUregMaskTP@@@Z                                                                                  : 658845785  : +64.48%   : 1.38%  : +0.0825%
?emitOutputInstr@emitter@@IEAA_KPEAUinsGroup@@PEAUinstrDesc@1@PEAPEAE@Z                                                                                    : 658192653  : +9.65%    : 1.38%  : +0.0824%
?emitGCregDeadUpd@emitter@@QEAAXW4_regNumber_enum@@PEAE@Z                                                                                                  : 629879757  : +107.83%  : 1.32%  : +0.0789%
?updateAssignedInterval@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z                                                                                   : 546122060  : +24.24%   : 1.15%  : +0.0684%
?emitStackPopLargeStk@emitter@@QEAAXPEAE_NEI@Z                                                                                                             : 525848563  : +104.66%  : 1.10%  : +0.0658%
?emitGetAdjustedSize@emitter@@QEBAIPEAUinstrDesc@1@_K@Z                                                                                                    : 487696755  : +31.37%   : 1.02%  : +0.0611%
?emitGCregLiveUpd@emitter@@QEAAXW4GCtype@@W4_regNumber_enum@@PEAE@Z                                                                                        : 451135285  : +59.41%   : 0.95%  : +0.0565%
?buildPhysRegRecords@LinearScan@@AEAAXXZ                                                                                                                   : 417375644  : +52.32%   : 0.88%  : +0.0523%
?AddRexWPrefix@emitter@@QEAA_KPEBUinstrDesc@1@_K@Z                                                                                                         : 337122934  : +62.86%   : 0.71%  : +0.0422%
?TakesEvexPrefix@emitter@@QEBA_NPEBUinstrDesc@1@@Z                                                                                                         : 326871135  : +13.69%   : 0.69%  : +0.0409%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z                                                              : 289859613  : +3.27%    : 0.61%  : +0.0363%
??0LinearScan@@QEAA@PEAVCompiler@@@Z                                                                                                                       : 287558884  : +56.87%   : 0.60%  : +0.0360%
?emitOutputRexOrSimdPrefixIfNeeded@emitter@@QEAAIW4instruction@@PEAEAEA_K@Z                                                                                : 276256843  : +10.64%   : 0.58%  : +0.0346%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@AEBQEA_KUregMaskTP@@6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N9@Z : 251568991  : +17.79%   : 0.53%  : +0.0315%
?resetAllRegistersState@LinearScan@@AEAAXXZ                                                                                                                : 250671960  : +48.42%   : 0.53%  : +0.0314%
?emitUpdateLiveGCregs@emitter@@QEAAXW4GCtype@@UregMaskTP@@PEAE@Z                                                                                           : 236180536  : +61.03%   : 0.50%  : +0.0296%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                                : 211945171  : +3.63%    : 0.44%  : +0.0265%
?genUpdateRegLife@CodeGenInterface@@QEAAXPEBVLclVarDsc@@_N1@Z                                                                                              : 208334297  : +146.29%  : 0.44%  : +0.0261%
?unassignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVRefPosition@@@Z                                                                                       : 204285611  : +8.70%    : 0.43%  : +0.0256%
?BuildCall@LinearScan@@AEAAHPEAUGenTreeCall@@@Z                                                                                                            : 201972715  : +19.34%   : 0.42%  : +0.0253%
?genProduceReg@CodeGen@@IEAAXPEAUGenTree@@@Z                                                                                                               : 156386903  : +5.43%    : 0.33%  : +0.0196%
?emitGetGCRegsSavedOrModified@emitter@@QEAA?AUregMaskTP@@PEAUCORINFO_METHOD_STRUCT_@@@Z                                                                    : 155613852  : NA        : 0.33%  : +0.0195%
??$resolveRegisters@$00@LinearScan@@QEAAXXZ                                                                                                                : 154302371  : +4.84%    : 0.32%  : +0.0193%
??$compChangeLife@$00@Compiler@@QEAAXAEBQEA_K@Z                                                                                                            : 150051997  : +21.15%   : 0.31%  : +0.0188%
?genPushCalleeSavedRegisters@CodeGen@@IEAAXXZ                                                                                                              : 136488370  : +268.86%  : 0.29%  : +0.0171%
?BuildRMWUses@LinearScan@@AEAAHPEAUGenTree@@00_K1@Z                                                                                                        : 119460162  : NA        : 0.25%  : +0.0150%
?emitInsSize@emitter@@QEAAIPEAUinstrDesc@1@_K_N@Z                                                                                                          : 113904280  : +11.97%   : 0.24%  : +0.0143%
??$resolveRegisters@$0A@@LinearScan@@QEAAXXZ                                                                                                               : 99510485   : +3.12%    : 0.21%  : +0.0125%
?BuildIndir@LinearScan@@AEAAHPEAUGenTreeIndir@@@Z                                                                                                          : 96091030   : +48.43%   : 0.20%  : +0.0120%
?compInitOptions@Compiler@@IEAAXPEAVJitFlags@@@Z                                                                                                           : 89279923   : +9.31%    : 0.19%  : +0.0112%
?instGen_Set_Reg_To_Imm@CodeGen@@QEAAXW4emitAttr@@W4_regNumber_enum@@_JW4insFlags@@@Z                                                                      : 78050532   : +26.93%   : 0.16%  : +0.0098%
?resolveLocalRef@LinearScan@@AEAAXPEAUBasicBlock@@PEAUGenTreeLclVar@@PEAVRefPosition@@@Z                                                                   : 74859133   : +3.74%    : 0.16%  : +0.0094%
??$allocateReg@$0A@@LinearScan@@AEAA?AW4_regNumber_enum@@PEAVInterval@@PEAVRefPosition@@@Z                                                                 : 74540254   : +6.75%    : 0.16%  : +0.0093%
memset                                                                                                                                                     : 73679442   : +1.13%    : 0.15%  : +0.0092%
?emitOutputRI@emitter@@QEAAPEAEPEAEPEAUinstrDesc@1@@Z                                                                                                      : 67994420   : +6.26%    : 0.14%  : +0.0085%
?insEncodeReg012@emitter@@QEAAIPEBUinstrDesc@1@W4_regNumber_enum@@W4emitAttr@@PEA_K@Z                                                                      : 65961952   : +6.54%    : 0.14%  : +0.0083%
?genSetRegToConst@CodeGen@@IEAAXW4_regNumber_enum@@W4var_types@@PEAUGenTree@@@Z                                                                            : 63572129   : +16.91%   : 0.13%  : +0.0080%
?emitInsSizeSV@emitter@@QEAAIPEAUinstrDesc@1@_KHH@Z                                                                                                        : 58355992   : +5.91%    : 0.12%  : +0.0073%
?BuildDefWithKills@LinearScan@@AEAAXPEAUGenTree@@H_KUregMaskTP@@@Z                                                                                         : 56553707   : +40.78%   : 0.12%  : +0.0071%
?BuildCast@LinearScan@@AEAAHPEAUGenTreeCast@@@Z                                                                                                            : 56461244   : NA        : 0.12%  : +0.0071%
?BuildStoreLocDef@LinearScan@@AEAAXPEAUGenTreeLclVarCommon@@PEAVLclVarDsc@@PEAVRefPosition@@H@Z                                                            : 53688042   : +14.79%   : 0.11%  : +0.0067%
?emitOutputRR@emitter@@QEAAPEAEPEAEPEAUinstrDesc@1@@Z                                                                                                      : 53279956   : +3.55%    : 0.11%  : +0.0067%
?genCallInstruction@CodeGen@@IEAAXPEAUGenTreeCall@@@Z                                                                                                      : 50084108   : +5.82%    : 0.11%  : +0.0063%
?emitHandleMemOp@emitter@@AEAAXPEAUGenTreeIndir@@PEAUinstrDesc@1@W4insFormat@1@W4instruction@@@Z                                                           : -58626864  : -10.34%   : 0.12%  : -0.0073%
?getMatchingConstants@LinearScan@@AEAA_K_KPEAVInterval@@PEAVRefPosition@@@Z                                                                                : -79107557  : -100.00%  : 0.17%  : -0.0099%
?emitSizeOfInsDsc_CNS@emitter@@AEBA_KPEAUinstrDesc@1@@Z                                                                                                    : -90499395  : -98.48%   : 0.19%  : -0.0113%
?BuildRMWUses@LinearScan@@AEAAHPEAUGenTree@@00_K@Z                                                                                                         : -120949346 : -100.00%  : 0.25%  : -0.0151%
?BuildGCWriteBarrier@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                      : -146449406 : -100.00%  : 0.31%  : -0.0183%
?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z                                                                                          : -188074386 : -3.81%    : 0.39%  : -0.0235%
?addKillForRegs@LinearScan@@AEAAXUregMaskTP@@I@Z                                                                                                           : -213435792 : -100.00%  : 0.45%  : -0.0267%
?BuildSimple@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                              : -345016623 : -99.92%   : 0.72%  : -0.0432%
?genCodeForTreeNode@CodeGen@@IEAAXPEAUGenTree@@@Z                                                                                                          : -414160388 : -6.66%    : 0.87%  : -0.0519%
?updateRegisterPreferences@Interval@@QEAAX_K@Z                                                                                                             : -580317174 : -100.00%  : 1.22%  : -0.0727%
?AddSimdPrefixIfNeededAndNotPresent@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                        : -885312893 : -100.00%  : 1.86%  : -0.1109%
?AddSimdPrefixIfNeeded@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                                     : -984986225 : -100.00%  : 2.07%  : -0.1233%

@DeepakRajendrakumaran
Copy link
Contributor Author

DeepakRajendrakumaran commented Nov 26, 2024

Trying to further make sure the Rex2 changes are not causing TP regression. We can safely conclude the TP regression is from eGPR enablement

The following is with/without Rex2 changes(without reg alloc changes)

Overall (+0.08% to +0.23%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.16%
coreclr_tests.run.windows.x64.checked.mch +0.23%
libraries.crossgen2.windows.x64.checked.mch +0.14%
libraries.pmi.windows.x64.checked.mch +0.11%
libraries_tests.run.windows.x64.Release.mch +0.18%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.12%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%
MinOpts (+0.28% to +0.48%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.48%
coreclr_tests.run.windows.x64.checked.mch +0.36%
libraries.crossgen2.windows.x64.checked.mch +0.38%
libraries.pmi.windows.x64.checked.mch +0.37%
libraries_tests.run.windows.x64.Release.mch +0.47%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.40%
smoke_tests.nativeaot.windows.x64.checked.mch +0.28%
FullOpts (+0.08% to +0.14%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.10%
coreclr_tests.run.windows.x64.checked.mch +0.14%
libraries.crossgen2.windows.x64.checked.mch +0.14%
libraries.pmi.windows.x64.checked.mch +0.11%
libraries_tests.run.windows.x64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.12%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%

With Rex2 as base and eGPR changes as diff

Overall (+3.60% to +4.65%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +4.33%
coreclr_tests.run.windows.x64.checked.mch +4.65%
libraries.crossgen2.windows.x64.checked.mch +4.29%
libraries.pmi.windows.x64.checked.mch +3.76%
libraries_tests.run.windows.x64.Release.mch +4.65%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +3.60%
smoke_tests.nativeaot.windows.x64.checked.mch +3.66%
MinOpts (+6.09% to +8.79%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +8.27%
coreclr_tests.run.windows.x64.checked.mch +6.09%
libraries.crossgen2.windows.x64.checked.mch +7.27%
libraries.pmi.windows.x64.checked.mch +6.86%
libraries_tests.run.windows.x64.Release.mch +8.42%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +7.16%
smoke_tests.nativeaot.windows.x64.checked.mch +8.79%
FullOpts (+3.47% to +4.29%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +3.59%
coreclr_tests.run.windows.x64.checked.mch +3.60%
libraries.crossgen2.windows.x64.checked.mch +4.29%
libraries.pmi.windows.x64.checked.mch +3.76%
libraries_tests.run.windows.x64.Release.mch +3.47%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +3.52%
smoke_tests.nativeaot.windows.x64.checked.mch +3.66%

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the first pass, need to evaluate where TP regression is coming from. However, I still see some asmdiffs...can you please fix it?

@@ -12534,6 +12555,9 @@ void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation curr
LinearScan::RegisterSelection::RegisterSelection(LinearScan* linearScan)
{
this->linearScan = linearScan;
#if defined(TARGET_AMD64)
rbmAllInt = linearScan->compiler->get_RBM_ALLINT();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we need it here instead of LinearScan ctor (which you are already doing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -742,6 +743,7 @@ class emitter
// The instrDescCGCA struct's member keeping the GC-ness of the first return register is _idcSecondRetRegGCType.
GCtype _idGCref : 2; // GCref operand? (value is a "GCtype")

#if !defined(TARGET_AMD64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment - having _idReg1/_idReg2 here with increased size caused padding and increased size even more

@@ -62,7 +62,12 @@ bool regMaskTP::IsRegNumInMask(regNumber reg, var_types type) const
//
void regMaskTP::AddGprRegs(SingleTypeRegSet gprRegs)
{
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support.
#if defined(TARGET_AMD64)
assert((gprRegs == RBM_NONE) || ((gprRegs & RBM_ALLINT_STATIC_ALL) != RBM_NONE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for non-APX machines, gpr will still be 0-15 and with this assert, we will allow float register to get set, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for non-APX machines, gpr will still be 0-15 and with this assert, we will allow float register to get set, right?

Not really. On both APX and non-apx machines bits 0-23 will be eGPR and 24-55 SIMD. We just make sure that 16-23 are not used for non APX machines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just make sure that 16-23 are not used for non APX machines

how are we making sure? worth adding some asserts.


// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. Deprecated????
#if defined(TARGET_AMD64)
sprintf_s(regmask, cchRegMask, REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need RBM_ALLINT_STATIC_ALL here? it should just use RBM_ALLINT and it should return the right mask depending on if high int registers are available or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBM_ALLINT and it should return the right mask depending on if high int registers are available or not - I'm not sure we can do that here. RBM_ALLINT calls get_RBM_ALLINT(). One way to make it work would be to move this method to part of compiler class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it...RBM_ALLINT_STATIC_ALL should be the one we should be using and we can have it for both x86 and x64 for consistency.
Alternatively, if you decide to add rbmAllInt on x86, we can just use RBM_ALLINT here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. These are used by GC
// exclusively
#if defined(TARGET_AMD64)
printf(REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here...can just use RBM_ALLINT?

@@ -3136,4 +3347,51 @@ inline SingleTypeRegSet LinearScan::BuildEvexIncompatibleMask(GenTree* tree)
#endif
}

inline bool LinearScan::DoesThisUseGPR(GenTree* op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add method docs for this and below method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method docs please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

return false;
}

inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree, bool forceLowGpr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the goal of this method?

SingleTypeRegSet op1Candidates = candidates;
SingleTypeRegSet op2Candidates = candidates;
int srcCount = 0;
// SingleTypeRegSet op1Candidates = candidates;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are lot of such comments in this file. can you please delete them?


// We are dealing exclusively with HWIntrinsics here
return (op->AsHWIntrinsic()->OperIsBroadcastScalar() ||
(op->AsHWIntrinsic()->OperIsMemoryLoad() && DoesThisUseGPR(op->AsHWIntrinsic()->Op(1))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we only care if Op(1) uses GPR, not any other operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For xarch, Op(1) is the memory address for nodes satifying GenTreeHWIntrinsic::OperIsMemoryLoad with the exception of 4 intrinsics(those 4 will not use this). And GPR is likely to be used only during mem addressing in these cases

else
{
// ToDo-APX : imul currently doesn't have rex2 support. So, cannot use R16-R31.
dstCandidates = BuildApxIncompatibleGPRMask(tree, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls to BuildApxIncompatibleGPRMask for many nodes seems expensive. Wondering if we can do something like:

  1. at the top just set SingleTypeRegSet incompatibleGprMask = compiler->canUseApxEncoding() ? lowGPRRegs() : RBM_NONE;
  2. Places where you are passing forceLowGpr= true can instead just use incompatibleGprMask.
  3. Places where you are not forcing lowGPr, can just use DoesThisUseGPR(tree) ? incompatibleGprMask : RBM_NONE

Also, might worth caching the value of lowGPRRegs() because currently it is evaluated every time to be (availableIntRegs & RBM_LOWINT.GetIntRegSet()) and I see lowGprRegs() is used at lot of places.

@kunalspathak
Copy link
Member

It seems from your latest change, there are still asmdiffs coming up. I think there are places in emitxarch.cpp that still rely on REG_R31 instead of get_REG_INT_LAST. Also, I am little surprised with the tpdiff numbers. Locally when I run tpdiff for asp.net collection, I get these numbers:

image

vs. what CI is showing

image

what does it show for you locally?

@kunalspathak
Copy link
Member

Also, I am little surprised with the tpdiff number

That was happening because my environment was not setup correctly. I can now see the same TP diffs that is shown in CI.

@DeepakRajendrakumaran
Copy link
Contributor Author

Also, I am little surprised with the tpdiff number

That was happening because my environment was not setup correctly. I can now see the same TP diffs that is shown in CI.

This reduced it by somewhere around 0.8%. Without that change for comparison - https://github.com/dotnet/runtime/pull/111004/checks?check_run_id=34998081643

@kunalspathak
Copy link
Member

Just a note that the TP regression we see here will impact not only non-APX machines but also AMD machines which do not have APX feature. We should add that consideration too while working on this on how we can reduce or have no impact on AMD.

@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the enableeGPR branch 3 times, most recently from 99438b6 to dc3a1e8 Compare January 30, 2025 19:20
@BruceForstall
Copy link
Member

Looks like no asm diffs now. Still some small x64 TP regression but I assume that is inevitable and expected.

fyi @DeepakRajendrakumaran there are merge conflicts

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments and questions. I think it is in good shape now, might just need one more round of updates.

@@ -771,7 +771,8 @@ class LinearScan : public LinearScanInterface
LSRA_LIMIT_SMALL_SET = 0x3,
#if defined(TARGET_AMD64)
LSRA_LIMIT_UPPER_SIMD_SET = 0x2000,
LSRA_LIMIT_MASK = 0x2003
LSRA_LIMIT_EXT_GPR_SET = 0x4000,
LSRA_LIMIT_MASK = 0x6003
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you verify if this gets automatically picked up with the JitStressRegs value we pass in CI? One of the place to see that list can be in:

"JitStressRegs=0",
"JitStressRegs=1",
"JitStressRegs=2",
"JitStressRegs=3",
"JitStressRegs=4",
"JitStressRegs=8",
"JitStressRegs=0x10",
"JitStressRegs=0x80",
"JitStressRegs=0x1000",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked this setting locally. But didn't know it needs to be enabled on CI. Will add and check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -37,6 +37,7 @@
DOTNET_EnableSSE41;
DOTNET_EnableSSE42;
DOTNET_EnableSSSE3;
DOTNET_EnableAPX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can you please remove it unless we want to add pipeline for it?

src/coreclr/jit/targetamd64.h Outdated Show resolved Hide resolved

#define RBM_ALLINT_INIT (RBM_INT_CALLEE_SAVED | RBM_INT_CALLEE_TRASH_INIT)
#define RBM_ALLINT get_RBM_ALLINT()
#define RBM_INT_CALLEE_TRASH_STATIC_ALL (RBM_INT_CALLEE_TRASH_INIT | RBM_HIGHINT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBM_INT_CALLEE_TRASH_INIT serves similar purpose as other #define with name *_STATIC_* in it. Can you have consistent nomenclature, either use *_INIT_* or *_STATIC_* in their names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently how I use them is as follows
*_INIT* is the list with registers which are definitely available on targetamd64
For e.g. RBM_INT_CALLEE_TRASH_INIT = (RBM_EAX|RBM_ECX|RBM_EDX|RBM_R8|RBM_R9|RBM_R10|RBM_R11)(No eGPRs). This allows us to dynamically add available registers during runtime based on APX availability

*_STATIC_ALL is a static list of all possible registers
For e.g. RBM_INT_CALLEE_TRASH_STATIC_ALL = (RBM_EAX|RBM_ECX|RBM_EDX|RBM_R8|RBM_R9|RBM_R10|RBM_R11) | (RBM_R16|RBM_R17|RBM_R18|RBM_R19|RBM_R20|RBM_R21|RBM_R22|RBM_R23) (includes eGPRs). The only purpose of this is to be able to do compile time static asserts.

So they have different uses. Are you asking to rename to RBM_INT_CALLEE_TRASH_INIT_ALL and RBM_INT_CALLEE_TRASH_STATIC_ALL ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense and *_INIT seems fine to me....I am little uncomfortable with the work _STATIC_ in the other #define, so if you can think of a better name that will be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated names by removing STATIC. RBM_INT_CALLEE_ALL, RBM_ALLINT_ALL describes these I think

src/coreclr/jit/regMaskTPOps.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsra.h Outdated Show resolved Hide resolved
src/coreclr/jit/lsraxarch.cpp Show resolved Hide resolved
src/coreclr/jit/lsraxarch.cpp Show resolved Hide resolved
@@ -43,7 +43,7 @@ inline static bool isHighGPReg(regNumber reg)
#ifdef TARGET_AMD64
// TODO-apx: the definition here is incorrect, we will need to revisit this after we extend the register definition.
// for now, we can simply use REX2 as REX.
return ((reg >= REG_R8) && (reg <= REG_R15));
return ((reg >= REG_R16) && (reg <= REG_R23));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the range for non-APX cases still be R8~R15? I mean we can have it extend to R8~R23 because we know there will not be any registers we will see between R16~R23 for non-APX, but it should at least start from R8 rather than R16, right?

src/coreclr/jit/emitxarch.cpp Show resolved Hide resolved
@DeepakRajendrakumaran
Copy link
Contributor Author

@kunalspathak Responding to this comment : I believe the consensus #108799 (comment) was to keep the setting

@kunalspathak
Copy link
Member

@kunalspathak Responding to this comment : I believe the consensus #108799 (comment) was to keep the setting

Can you verify if setting DOTNET_AltJit=MethodName and DOTNET_AltJitName=clrjit_win_x64_x64.dll generates APX code (without having to set DOTNET_EnableAPX=1)? If that is the case, I am guessing we will still need a pipeline with setting DOTNET_AltJit=* (to say that compile all methods for APX) and will get triggered when handful of files in jit folders are touched.

@DeepakRajendrakumaran
Copy link
Contributor Author

@kunalspathak Responding to this comment : I believe the consensus #108799 (comment) was to keep the setting

Can you verify if setting DOTNET_AltJit=MethodName and DOTNET_AltJitName=clrjit_win_x64_x64.dll generates APX code (without having to set DOTNET_EnableAPX=1)? If that is the case, I am guessing we will still need a pipeline with setting DOTNET_AltJit=* (to say that compile all methods for APX) and will get triggered when handful of files in jit folders are touched.

I'm able to verify that APX code can be generated using the following

Build APX branch with following
build.cmd clr+libs -rc Checked -lc Release
src\tests\build.cmd x64 checked generatelayoutonly

This is the tricky bit. We need a custom coredistools.dll. This is the part that probably makes enabling CI for this tricky. The reason for needing this is the current coredistools.dll cannot decode APX encoding. I believe @BruceForstall is aware of this

Go to jitutils and run bootstrap.cmd

Set following env variables
set DOTNET_MaxVectorTBitwidth=128
set Dotnet_EnableAPX=1

set Dotnet_JitStressRegs=0x4000 - this is optional. It just forces eGPR usage where possible

Run from runtime repo root.
jit-diff diff --output C:\Dotnet\APX\Output --pmi --diff --corelib --altjit clrjit_unix_x64_x64.dll --diff C:\Dotnet\runtime\artifacts\bin\coreclr\windows.x64.Checked --core_root C:\Dotnet\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root --arch x64

image

@kunalspathak
Copy link
Member

This is the tricky bit. We need a custom coredistools.dll. This is the part that probably makes enabling CI for this tricky.

You don't need coredistools to just compile the method with altjit to make sure it doesn't hit any assert. coredistools is needed for pipelines like asmdiffs, which we don't want to add right now. But we can have a pipeline that runs coreclr Pri0/Pri1 tests with APX=On.

@BruceForstall
Copy link
Member

But we can have a pipeline that runs coreclr Pri0/Pri1 tests with APX=On.

Can this be done as a follow-up PR? It doesn't seem necessary to gate this PR on that addition.

@kunalspathak
Copy link
Member

But we can have a pipeline that runs coreclr Pri0/Pri1 tests with APX=On.

Can this be done as a follow-up PR? It doesn't seem necessary to gate this PR on that addition.

I am ok with that as long as we have a tracking issue to add that coverage sooner rather than later.

@kunalspathak
Copy link
Member

Failures in jitstress-isas-avx512 seems to be #112163. Can you confirm if JIT/Regression/JitBlue/Runtime_74635/Runtime_74635_1/Runtime_74635_1.dll failure is related to your change?

@DeepakRajendrakumaran
Copy link
Contributor Author

74635

Did the following

Build repo and tests
build clr+libs -rc Checked -lc release
src\tests\build.cmd x64 checked tree JIT\Regression\ -priority=0

run tests normally
src\tests\run.cmd x64 Checked

run tests with tiered compilation off

set Dotnet_Tieredcompilation=0
src\tests\run.cmd x64 Checked

Not seeing any fails
image

From logs
image

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

/ba-g failures seems unrelated

@kunalspathak kunalspathak merged commit c153833 into dotnet:main Feb 7, 2025
120 of 125 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants