Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up alias analysis #8695

Merged
merged 25 commits into from
Oct 4, 2024
Merged

Speed up alias analysis #8695

merged 25 commits into from
Oct 4, 2024

Conversation

frej
Copy link
Contributor

@frej frej commented Jul 30, 2024

This series of patches improves the performance of the alias analysis pass to roughly halve the time required for the analysis when measured by scripts/diffable.

The improvement comes from a combination of an improved strategy for deciding on which functions requiring repeated analysis when alias information changes, changes to the aliasing state database to reduce its size but increase its expressiveness, and an adaptive strategy for pruning aliasing state database states.

Outside of the alias analysis pass this PR extends the beam_digraph module with new functions for use by the pass, otherwise its changes are limited to the beam_ssa_alias and beam_ssa_ss modules. Where, apart from the improvement in performance, it also improves the debuggability by dumping the internal structures in Graphviz-format when trace printouts are enabled.

The commit f480318 is already under review in in #8686 respectively, I will refresh this PR when it is merged. The prerequisites are now merged.

Copy link
Contributor

github-actions bot commented Jul 30, 2024

CT Test Results

    2 files    321 suites   10m 31s ⏱️
  818 tests   816 ✅ 2 💤 0 ❌
5 381 runs  5 379 ✅ 2 💤 0 ❌

Results for commit b83561a.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng self-assigned this Jul 31, 2024
@bjorng bjorng added team:VM Assigned to OTP team VM enhancement labels Jul 31, 2024
@frej
Copy link
Contributor Author

frej commented Jul 31, 2024

@github-actions disable-cache

@frej
Copy link
Contributor Author

frej commented Jul 31, 2024

@bjorng: I'm unable to reproduce the failures of compilation_SUITE:self_compile_old_inliner and compilation_SUITE:self_compile locally (both with and without a bootstrap update). Previously that kind of failure has been avoided by a @github-actions disable-cache comment, but it appears that it doesn't help (does it have to be done by someone with special permissions?), any tips?

@bjorng
Copy link
Contributor

bjorng commented Jul 31, 2024

@github-actions disable-cache

@bjorng
Copy link
Contributor

bjorng commented Jul 31, 2024

@frej I have issued the disable-cache command myself; I don't know whether that will work better.

@bjorng
Copy link
Contributor

bjorng commented Jul 31, 2024

In 76c0f32 ("compiler alias analysis: Change work limiting strategy"), there seems to be an extra word in the second paragraph of the commit message:

Compared to the old limit strategy, this allows all more modules...

@bjorng
Copy link
Contributor

bjorng commented Jul 31, 2024

Added to our daily builds, along with updated primary bootstrap and preloaded modules.

@frej
Copy link
Contributor Author

frej commented Jul 31, 2024

In 76c0f32 ("compiler alias analysis: Change work limiting strategy"), there seems to be an extra word in the second paragraph of the commit message:

Fixed.

@frej frej force-pushed the frej/speedup branch 4 times, most recently from 7b2b9db to ed7bfc1 Compare July 31, 2024 12:38
@bjorng
Copy link
Contributor

bjorng commented Jul 31, 2024

@bjorng: I'm unable to reproduce the failures of compilation_SUITE:self_compile_old_inliner and compilation_SUITE:self_compile locally (both with and without a bootstrap update). Previously that kind of failure has been avoided by a @github-actions disable-cache comment, but it appears that it doesn't help (does it have to be done by someone with special permissions?), any tips?

The failures can be reproduced locally by running the tests with an installed Erlang system (make release or make install). The test cases fail because of a missing .hrl file.

Here is how to fix it:

diff --git a/lib/compiler/src/Makefile b/lib/compiler/src/Makefile
index 569f057589..9b50016bb5 100644
--- a/lib/compiler/src/Makefile
+++ b/lib/compiler/src/Makefile
@@ -112,6 +112,7 @@ BEAM_H = $(wildcard ../priv/beam_h/*.h)
 HRL_FILES= \
        beam_asm.hrl \
        beam_disasm.hrl \
+       beam_ssa_alias_debug.hrl \
        beam_ssa_opt.hrl \
        beam_ssa.hrl \
        beam_types.hrl \

@frej
Copy link
Contributor Author

frej commented Jul 31, 2024

The test cases fail because of a missing .hrl file.

Good catch! Time to update the script I use for running the test suite, it set PATH="$ERL_TOP/bin" instead of the $MY_INSTALL_DIR/bin.

Something that's a little strange was that one of my added ct:pal/3 calls failed with a bad argument to io:format and what it complained about was the filename of the module being compiled couldn't be formatted with ~s. The filename was [47,98,117,105,108,100,114,111,111,116,47,111,116,112,47,69,114,108,97,110,103,32,8709,8868,8478,47,108,105,98,47,99,111,109,112,105,108,101,114,45,56,46,53,46,49,47,115,114,99,47,98,101,97,109,95,115,115,97,95,97,108,105,97,115,46,101,114,108] which is "/buildroot/otp/Erlang " some-unicode-codepoints "/lib/compiler-8.5.1/src/beam_ssa_alias.erl" . So I will keep that one around for one more turn through the CI.

@bjorng
Copy link
Contributor

bjorng commented Aug 1, 2024

Something that's a little strange was that one of my added ct:pal/3 calls failed with a bad argument to io:format and what it complained about was the filename of the module being compiled couldn't be formatted with ~s. The filename was [47,98,117,105,108,100,114,111,111,116,47,111,116,112,47,69,114,108,97,110,103,32,8709,8868,8478,47,108,105,98,47,99,111,109,112,105,108,101,114,45,56,46,53,46,49,47,115,114,99,47,98,101,97,109,95,115,115,97,95,97,108,105,97,115,46,101,114,108] which is "/buildroot/otp/Erlang " some-unicode-codepoints "/lib/compiler-8.5.1/src/beam_ssa_alias.erl" . So I will keep that one around for one more turn through the CI.

You will need to use ~ts to support printing of code points beyond 255.

@frej frej force-pushed the frej/speedup branch 3 times, most recently from f330d3f to 902574c Compare August 1, 2024 05:52
@frej
Copy link
Contributor Author

frej commented Aug 1, 2024

You will need to use ~ts to support printing of code points beyond 255.

Thanks, and that fixes the problem. The strange code-points appears to be intentional too, the CI sets RELEASE_ROOT=/buildroot/otp/Erlang ∅⊤℞' so I guess it is for ensuring that OTP can handle an installation directory name with space and unusual code-points.

@bjorng
Copy link
Contributor

bjorng commented Aug 1, 2024

The strange code-points appears to be intentional too, the CI sets RELEASE_ROOT=/buildroot/otp/Erlang ∅⊤℞' so I guess it is for ensuring that OTP can handle an installation directory name with space and unusual code-points.

Yes.

@frej
Copy link
Contributor Author

frej commented Aug 5, 2024

Refreshed patch to squash fixups and #8687 has been merged. 6c95e36 is a new commit fixing an omission in the ssa-checker.

@bjorng
Copy link
Contributor

bjorng commented Aug 5, 2024

Thanks, added to our daily builds.

frej added 24 commits August 20, 2024 14:49
There was previously no way to properly match succeed instructions,
this patch fixes this omission.
Simplify the logic of beam_ssa_destructive_update:patch_is/5 to not
special-case the extraction of force_copy patches as a part of
processing a series of opargs patches. Instead filter out the relevant
patches and let the rest be handled by a recursive call.
Implement del_vertex/2 analogous to digraph:del_vertex/2.
Add beam_digraph:vertex/3 which allows for a default label if the
vertex does not exist.
Implement edges/1 analogous to digraph:edges/1.
Improve debug dumps of the sharing state database by dumping the graph
structure in Graphviz format.
beam_ssa_ss:variables/1 does not show up as a critical function during
profiling, so remove the TODO. Pushing beam_ssa_ss-specific
functionality down into beam_digraph is not motivated.
Export beam_ssa_ss:dump/1 to make it accessible by the beam_ssa_alias
module for printing the internal state of the pass. Break out the
various settings for debugging the alias pass into a shared header.
Use beam_ssa_ss:dump/1 for all debug printouts of sharing states.
Unify printing of function names in debug printouts.
Speed up beam_ssa_ss:prune() by instead of recreating the state by
finding all variables reachable from the live variables, instead
remove variables that have been killed.

The main speedup comes from the observation that in most cases more
than half of the variables in the state survives the pruning.
This change relaxes the requirement that every SSA variable has to be
present in the sharing state database by allowing plain values to be
omitted. For now, only omit creating a variable for values created by
instructions known to produce plain values, i.e. no type annotations
are used.

As variables are no longer, by default, inserted in the database for
each instruction by aa_is/3, avoid two database changes when a
variable is created and immediately aliased by inserting and aliasing
it in one operation.

This change reduces the number of vertices processed by
beam_ssa_ss:prune/3 by 7% and the number of pruned vertices by 20%.
This is a micro optimization which avoids creating a variable in the
sharing database when, during the processing of the instruction, we
will immediately update its status. This reduces the number of sharing
database operations as we can insert and set the status in a single
operation. It also allows us to completely avoid creating a variable
for extracted plain values.
Consider embed-extract chains when incorporating information about the
aliasing status of function arguments.
Previously the status of a variable in the sharing state database was
either `unique` or `aliased`. This patch extends the set of statuses
with a third: `no_info`, meaning that nothing is known about the alias
status of the variable, nor of its structure. This new status will be
used in future changes in order to allow for faster and more efficient
calculation of the alias status of a module.
Switch from a dumb fixpoint iteration where we just iterated until the
sharing state for a all functions stopped changing, to a more
intelligent strategy where we only consider changes that are visible
outside the function. Concretely that means that we only re-analyze
callers if the callee's result's status changed. And callees if
argument aliasing status has changed.

The new strategy leads to a test failure (marked xfail), but that
limitiation will be removed in a later patch.
A zero-arity function, without internal calls, is likely to only have
to be analyzed once before producing a stable alias status for its
result. This means that callers of the zero-arity function will
benefit from, at their first analysis pass, having a not `no_info`
call result, thus making the the fixpoint iteration converging faster.

This patch therefore changes the traversal order to explicitly analyze
zero-arity functions first.
Always record call argument status regardless of whether the callee
has been analyzed. This change temporarily leads to test-suite
failures, but also fixes a test which was broken by the commit which
improved the fixpoint iteration strategy.
Return a `no_info` sharing status for the result of calling a function
which has not yet been analyzed.

This change restores the test cases broken by "compiler alias
analysis: Always record call argument status".
Stop treating Phi instructions as an extraction, instead make use of
the infrastructure used to derive the structure and alias status of
function arguments.

The sharing state database representation for the result of a Phi
instruction is the same as for a function argument of a function with
multiple call sites, where the concrete argument values at the call
sites are the value at the respective predecessor block.
Instead of limiting the maximum number of allowed traversals of the
functions in the module, limit the number of times a function is
analyzed before aborting the alias analysis pass.

Compared to the old limit strategy, this allows all modules compiled
by scripts/diffable to successfully complete, without a noticeable
change in runtime.
Sharing state information flows from callers to callees for function
arguments, and from callees to callers for results. Previously callers
were always analyzed before callees, but it turns out that the alias
analysis converges faster if the traversal order is alternated between
caller-callee and callee-caller.
In around 80% of the cases when prune is called, more than half of the
nodes in the sharing state database survive. Therefore a pruning
strategy which removes nodes from the database has been used. This
patch resurrects the previously used pruning algorithm which instead
recreated the pruned state from scratch instead of removing nodes
which is faster when only a few nodes survive the pruning. The alias
analysis implementation is then extended to, for each pruned basic
block, by looking at the number of nodes in the database before and
after the prune, record which strategy is the most efficient. If the
basic block is revisited, the fastest prune implementation is
selected. On the modules compiled by the `scripts/diffable` tool, this
change reduces the time spent on alias analysis by close to six
percent.
Take care to not produce a reuse hint when more than one update
exists. There is no point in attempting the reuse optimization when
more than one element is updated, as checking more than one element at
runtime is known to be slower than just copying the tuple in most
cases. Additionally, using a copy hint occasionally allows the alias
analysis pass to do a better job.
@frej
Copy link
Contributor Author

frej commented Aug 20, 2024

Refreshed to fix conflicts against the master branch.

@bjorng
Copy link
Contributor

bjorng commented Aug 21, 2024

Added to our daily builds.

@bjorng bjorng merged commit cc9f43e into erlang:master Oct 4, 2024
16 checks passed
@bjorng
Copy link
Contributor

bjorng commented Oct 4, 2024

Thanks for your pull request.

@frej frej deleted the frej/speedup branch October 4, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants