Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Matrix][E2E] Organize matrix tests #16563

Merged
merged 14 commits into from
Jan 23, 2025

Conversation

ayylol
Copy link
Contributor

@ayylol ayylol commented Jan 8, 2025

Move Matrix E2E test header files into the Matrix/Inputs folder. Also adds that folder to the included directories by adding to the %clangxx expansion in the Matrix/lit.local.cfg file to simplify the include statements.

@ayylol ayylol changed the title [SYCL][E2E][NFC] Organize matrix tests [SYCL][Matrix][E2E] Organize matrix tests Jan 9, 2025
@ayylol ayylol marked this pull request as ready for review January 9, 2025 17:22
@ayylol ayylol requested review from a team as code owners January 9, 2025 17:22
@ayylol ayylol requested a review from uditagarwal97 January 9, 2025 17:22
Copy link
Contributor

@uditagarwal97 uditagarwal97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @intel/sycl-matrix-reviewers agrees to this change, I think it would be better to break down this PR into two:

  1. NFC PR to relocate Matrix E2E tests into new folders (Inputs, and intel_matrix)
  2. Follow-up PR to remove REQUIRES: aspect-ext_intel_matrix from individual tests and add it to LIT config file.

That should make it easier to review the changes.

@ayylol
Copy link
Contributor Author

ayylol commented Jan 9, 2025

If @intel/sycl-matrix-reviewers agrees to this change, I think it would be better to break down this PR into two:

1. NFC PR to relocate Matrix E2E tests into new folders (`Inputs`, and `intel_matrix`)

2. Follow-up PR to remove `REQUIRES: aspect-ext_intel_matrix` from individual tests and add it to LIT config file.

That should make it easier to review the changes.

To be clear my motivation for this PR was to move the REQUIRES: aspect-ext_intel_matrix tests into one folder to add the lit.local.cfg file. I was doing this because I was experimenting with some explicit markup for setting the triple in build-only testing mode, and these tests are all only able to build on spir64. I made the PR however, since I figured these would be positive changes regardless of what direction my experimentation goes.

Let me know if these changes are agreeable, if so I'll go ahead and do what Udit suggested, and split this PR. Or also if there may be some other way to organize these tests that would be preferred.

@dkhaldi
Copy link
Contributor

dkhaldi commented Jan 9, 2025

I don't see a value of moving out the Intel aspect. It may be desirable for users to see these aspects specified in each individual test.
Moreover, moving these tests to an Intel folder gives the impression that they are only Intel-only hardware specific tests which they are not. We just did not get around making them also work on Nvidia and AMD hardware. The other tests are the ones that are Nvidia and AMD specific.
Having said that, if this is really required, since there are only a few tests left in test-e2e/Matrix, I would argue to move the few tests to a separate folder. Then you can move the Intel aspect from individual tests and add it to LIT config file.

As for moving the .hpp files to a new folder seems to be fine. I don't have a strong opinion about it.

@YuriPlyakhin
Copy link
Contributor

YuriPlyakhin commented Jan 9, 2025

In general, I think this is a positive change. I will not object to it.

Some thoughts on top:

  • folder name: intel_matrix -> Intel. I don't think we need to duplicate the word matrix. Also it seems starting from Matrix level, folders have first big letter.
  • #include "../../Inputs/common.hpp" can be moved from .cpp files to .hpp files, so it will be always #include common.hpp", which is simpler
  • For consistency, hip tests can be also moved to "HIP" directory, and tensorcores to "Tensorcores" directory (BTW all tensorecores require // REQUIRES: cuda so I guess also can use LIT config file)

@ayylol
Copy link
Contributor Author

ayylol commented Jan 20, 2025

@dkhaldi @YuriPlyakhin I removed the reorganization of the .cpp files from this pr, now this pr only includes moving the .hpp files to the Inputs folder. I also included the Inputs folder in the %clangxx expansion so that the relative path for the header files doesnt need to be used.

  • #include "../../Inputs/common.hpp" can be moved from .cpp files to .hpp files, so it will be always #include common.hpp", which is simpler

I tried this but it did not work for all tests, since some tests had to set variables between including common.hpp, and the other .hpp file that they include. For example:

#include "../common.hpp"
#define SG_SZ 32
constexpr size_t TN = 16;
#include "../element_wise_all_ops_tf32_impl.hpp"

Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cherry pick this to xmain as there are other "inner" tests that need to follow these changes.

@ayylol
Copy link
Contributor Author

ayylol commented Jan 23, 2025

@intel/llvm-gatekeepers This should me ready to merge.

The one failure is in an AddressSanitizer test which this pr doesnt touch. Possibly related to #16405, not entirely sure.

@dm-vodopyanov
Copy link
Contributor

@intel/llvm-gatekeepers This should me ready to merge.

The one failure is in an AddressSanitizer test which this pr doesnt touch. Possibly related to #16405, not entirely sure.

Failures are different.

From this PR:

Segmentation fault from GPU at 0xff00080000200000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 3 (PML4), access: 0 (Read), banned: 0, aborting. 

From #16405:

<SANITIZER>[ERROR]: Shadow memory reserved failed with size 0x100000000000: UR_RESULT_ERROR_UNSUPPORTED_FEATURE 

Can you please submit a separate issue?

@ayylol
Copy link
Contributor Author

ayylol commented Jan 23, 2025

Can you please submit a separate issue?

Ok, opened up #16751

@dm-vodopyanov dm-vodopyanov merged commit 251eb90 into intel:sycl Jan 23, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants