-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up alias analysis #8695
Speed up alias analysis #8695
Conversation
CT Test Results 2 files 321 suites 10m 31s ⏱️ Results for commit b83561a. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
@github-actions disable-cache |
@bjorng: I'm unable to reproduce the failures of |
@github-actions disable-cache |
@frej I have issued the disable-cache command myself; I don't know whether that will work better. |
In 76c0f32 ("compiler alias analysis: Change work limiting strategy"), there seems to be an extra word in the second paragraph of the commit message:
|
Added to our daily builds, along with updated primary bootstrap and preloaded modules. |
Fixed. |
7b2b9db
to
ed7bfc1
Compare
The failures can be reproduced locally by running the tests with an installed Erlang system ( Here is how to fix it:
|
Good catch! Time to update the script I use for running the test suite, it set Something that's a little strange was that one of my added |
You will need to use |
f330d3f
to
902574c
Compare
Thanks, and that fixes the problem. The strange code-points appears to be intentional too, the CI sets |
Yes. |
Thanks, added to our daily builds. |
There was previously no way to properly match succeed instructions, this patch fixes this omission.
Simplify the logic of beam_ssa_destructive_update:patch_is/5 to not special-case the extraction of force_copy patches as a part of processing a series of opargs patches. Instead filter out the relevant patches and let the rest be handled by a recursive call.
Implement del_vertex/2 analogous to digraph:del_vertex/2.
Add beam_digraph:vertex/3 which allows for a default label if the vertex does not exist.
Implement edges/1 analogous to digraph:edges/1.
Improve debug dumps of the sharing state database by dumping the graph structure in Graphviz format.
beam_ssa_ss:variables/1 does not show up as a critical function during profiling, so remove the TODO. Pushing beam_ssa_ss-specific functionality down into beam_digraph is not motivated.
Export beam_ssa_ss:dump/1 to make it accessible by the beam_ssa_alias module for printing the internal state of the pass. Break out the various settings for debugging the alias pass into a shared header.
Use beam_ssa_ss:dump/1 for all debug printouts of sharing states.
Unify printing of function names in debug printouts.
Speed up beam_ssa_ss:prune() by instead of recreating the state by finding all variables reachable from the live variables, instead remove variables that have been killed. The main speedup comes from the observation that in most cases more than half of the variables in the state survives the pruning.
This change relaxes the requirement that every SSA variable has to be present in the sharing state database by allowing plain values to be omitted. For now, only omit creating a variable for values created by instructions known to produce plain values, i.e. no type annotations are used. As variables are no longer, by default, inserted in the database for each instruction by aa_is/3, avoid two database changes when a variable is created and immediately aliased by inserting and aliasing it in one operation. This change reduces the number of vertices processed by beam_ssa_ss:prune/3 by 7% and the number of pruned vertices by 20%.
This is a micro optimization which avoids creating a variable in the sharing database when, during the processing of the instruction, we will immediately update its status. This reduces the number of sharing database operations as we can insert and set the status in a single operation. It also allows us to completely avoid creating a variable for extracted plain values.
Consider embed-extract chains when incorporating information about the aliasing status of function arguments.
Previously the status of a variable in the sharing state database was either `unique` or `aliased`. This patch extends the set of statuses with a third: `no_info`, meaning that nothing is known about the alias status of the variable, nor of its structure. This new status will be used in future changes in order to allow for faster and more efficient calculation of the alias status of a module.
Switch from a dumb fixpoint iteration where we just iterated until the sharing state for a all functions stopped changing, to a more intelligent strategy where we only consider changes that are visible outside the function. Concretely that means that we only re-analyze callers if the callee's result's status changed. And callees if argument aliasing status has changed. The new strategy leads to a test failure (marked xfail), but that limitiation will be removed in a later patch.
A zero-arity function, without internal calls, is likely to only have to be analyzed once before producing a stable alias status for its result. This means that callers of the zero-arity function will benefit from, at their first analysis pass, having a not `no_info` call result, thus making the the fixpoint iteration converging faster. This patch therefore changes the traversal order to explicitly analyze zero-arity functions first.
Always record call argument status regardless of whether the callee has been analyzed. This change temporarily leads to test-suite failures, but also fixes a test which was broken by the commit which improved the fixpoint iteration strategy.
Return a `no_info` sharing status for the result of calling a function which has not yet been analyzed. This change restores the test cases broken by "compiler alias analysis: Always record call argument status".
Stop treating Phi instructions as an extraction, instead make use of the infrastructure used to derive the structure and alias status of function arguments. The sharing state database representation for the result of a Phi instruction is the same as for a function argument of a function with multiple call sites, where the concrete argument values at the call sites are the value at the respective predecessor block.
Instead of limiting the maximum number of allowed traversals of the functions in the module, limit the number of times a function is analyzed before aborting the alias analysis pass. Compared to the old limit strategy, this allows all modules compiled by scripts/diffable to successfully complete, without a noticeable change in runtime.
Sharing state information flows from callers to callees for function arguments, and from callees to callers for results. Previously callers were always analyzed before callees, but it turns out that the alias analysis converges faster if the traversal order is alternated between caller-callee and callee-caller.
In around 80% of the cases when prune is called, more than half of the nodes in the sharing state database survive. Therefore a pruning strategy which removes nodes from the database has been used. This patch resurrects the previously used pruning algorithm which instead recreated the pruned state from scratch instead of removing nodes which is faster when only a few nodes survive the pruning. The alias analysis implementation is then extended to, for each pruned basic block, by looking at the number of nodes in the database before and after the prune, record which strategy is the most efficient. If the basic block is revisited, the fastest prune implementation is selected. On the modules compiled by the `scripts/diffable` tool, this change reduces the time spent on alias analysis by close to six percent.
Take care to not produce a reuse hint when more than one update exists. There is no point in attempting the reuse optimization when more than one element is updated, as checking more than one element at runtime is known to be slower than just copying the tuple in most cases. Additionally, using a copy hint occasionally allows the alias analysis pass to do a better job.
Refreshed to fix conflicts against the master branch. |
Added to our daily builds. |
Thanks for your pull request. |
This series of patches improves the performance of the alias analysis pass to roughly halve the time required for the analysis when measured by
scripts/diffable
.The improvement comes from a combination of an improved strategy for deciding on which functions requiring repeated analysis when alias information changes, changes to the aliasing state database to reduce its size but increase its expressiveness, and an adaptive strategy for pruning aliasing state database states.
Outside of the alias analysis pass this PR extends the beam_digraph module with new functions for use by the pass, otherwise its changes are limited to the
beam_ssa_alias
andbeam_ssa_ss
modules. Where, apart from the improvement in performance, it also improves the debuggability by dumping the internal structures in Graphviz-format when trace printouts are enabled.The commit f480318 is already under review in in #8686 respectively, I will refresh this PR when it is merged.The prerequisites are now merged.