-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and Improved MapFusion #1629
base: main
Are you sure you want to change the base?
New and Improved MapFusion #1629
Conversation
Now using the 3.9 type hints.
But it is too restrictive.
When the function was fixing the innteriour of the second map, it did not remove the readiong.
It almost passes all fuction. However, the one that needs renaming are not yet done.
…t in the input and output set. However, it is very simple.
Before it was going to look for the memlet of the consumer or producer. However, one should actually only look at the memlets that are adjacent to the scope node. At least this is how the original worked. I noticed this because of the `buffer_tiling_test.py::test_basic()` test. I was not yet focused on maps that were nested and not multidimensional. It seems that the transformation has some problems there.
Whet it now cheks for covering (i.e. if the information to exchange is enough) it will now no longer decend into the maps, but only inspect the first outgoing/incomming edges of the map entrie and exit. I noticed that the other way was to restrictive, especially for map tiling.
Otherwise we can end up in recursion.
Before it was replacing the elimated variables by zero. Which actually worked pretty good, but I have now changed that such that `offset()` is used. I am not sure why I used `replace` in the first place, but I think that there was an issue. However, I am not sure.
Added a new special case.
…ured. Before the output edges were before set to dynamic. However, this was not true as it was always set, thus the new map fusion did not fuse them. My first attempt was to just disable the `dynamic` property, but now the SDFG is generated manually. It is almost the same, but uses lesss symbol, as it was simpler to implement it this way, and we are now using float.
For such edges we are sure that the data exists, so it is just a conditional read, which is fine.
25f5a73
to
aa3619f
Compare
Using `nodes()` on an SDFG will only give us the control flow regions, but using `state` will give us also the nested states. I looked through my code and this seems to be the only places where they appear. This fixes the correlaton test, but the heat test still fails.
The issue was similar as before. When I computed the name of the intermediate transient then I used `sdfg.node_id(state)` to get the state ID. However, now if the state is part of these recursive control flow regions then this may not work, because the state is not a direct node of the SDFG. However, if I use `self.state_id` then it works, this is what the old MapFusion was doing.
This tests dynamic Memlets inside producers; the original transformation fails on it.
Thanks for reviewing and the wall of text. To give you some context. The main issues I found (not limited to ICON4Py) were:
I want to point out that this PR adds a lot of tests for MapFusion (approximately 40% of the edits) and the previous version is not able to pass them; roughly 1/3 of them fails. Regarding the description, I agree the doc string of the class is not that good, however, the code itself is in my view better documented than before, but I have updated the description of the transformation to give a better high level overview, which points to the functions that performs the tests. I do not know OTFMapFusion and SubgraphFusion very well, however, I have seen that SubgraphFusion is much more general, for example, instead of reducing the intermediate it will move the intermediate data access inside the Map. I think the best way to see my MapFusion is not as something new but just as a new iteration of what was already there, it just performs more analysis to handle more cases than before. This allows it to handle more cases. However, there are still some todo's that are open. I have to admit that I have not performed any testing of the runtime, but I do not have the impression that it takes much more time than before. The reason is that MapFusion is, beside two exceptions, a very local operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the questions and concerns. This LGTM now in general. Please update the PR description to be in-line with the actual changes after revisions (ideally also with some of the details from the docstring of the transformation itself). After that I am happy to approve the PR.
The transformation checks if the first map satisifes the data dependencies of the second map. For this is looks at the writes and reads of the intermediate. It also checks if, a data container is used as input of the first and as output of the second map, if the access is pointwise and can be fused. Furthermore, it was allowed that the intermediate is also used as input to the first map. However, in that particular case, it was not checked if the the reads and writes of the first map alone to the intermediate are valid. I.e. it could read read `A[i]` but write `A[i+1]` which would cause problems (note that this usage is botherline legal anyway. This commit adds a check to make sure that this is not the case by enforcing if a data container is used as input and output of the first map and also as intermediate node then its read must be pointwise. Note that if it is not an intermediate node, i.e. not also read by the second map, then this rule does not apply. NOTE: It is forbidden that the intermediate is used as intermediate and output of the second map.
This PR introduces a new and improved version of
MapFusion
.The PR fixes several bugs and several limitations of the previous versions.
This is a summary of all the changes:
.subset
member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See thetest_offset_correction_range_read()
,test_offset_correction_scalar_read()
and thetest_offset_correction_empty()
tests..subset
and ignored.other_subset
. Which is correct in most cases but not always. See thetest_fusion_intrinsic_memlet_direction()
for more.auto_optimizer()
is such that it takes advantage of it. See also the comment aboutassume_always_shared
flag..dynamic
property of the Memelts were fully ignored leading to wrong code.A[i, j]
and the other map was accessingB[i + 1, j]
. Now this is possible as long as every access is point wise. See thetest_fusion_different_global_accesses()
test for an example.T
, had shape(10, 1, 20)
and inside the map was accessedT[__i, 0, __j]
, then the old transformation would have created an reduced intermediate of shape(1, 1, 1)
, new its shape is(1)
. Note if the intermediate has shape(10, 20)
instead and would be accessed asT[__i, __j]
then aScalar
would have been created. See also thestruct_dataflow
flag below.In addition some new flags were introduced:
only_toplevel_maps
: IfTrue
the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged.only_inner_maps
: IfTrue
then the transformation will only fuse maps that are inside other maps.: If
True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG.strict_dataflow
: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations.Collection of known issues in other transformation:
RefineNestedAccess
and `SDFGState._read_and_write_sets()