Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for CR-1219310 : Spatial sharing overhead test failure on linux #8647

Closed
wants to merge 0 commits into from

Conversation

aktondak
Copy link
Collaborator

@aktondak aktondak commented Dec 9, 2024

Problem solved by the commit

This PR fixes CR-1219310

  • The problem was identified as missing init_buffer call which made the instruction buffer all zeros and causing the device to behave in-deterministically.
  • This PR also cleans up some utility code in TestRunner class and correctly moves it to ValidateUtilities.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

CR-1219310
Discovered through testing on Linux by the Linux folks.

How problem was solved, alternative solutions (if any) and why they were rejected

The problem was solved through correctly using the df-bw control code and modifying the buffer sizes accordingly.

Risks (if any) associated the changes in the commit

None

What has been tested and how, request additional testing if necessary

Tested on Windows and linux platforms. Updated metric numbers after testing :
Windows :

Z:\Repos\XRT-MCDM-FORK\XRT-MCDM\build\WRelease\xilinx\xrt>xrt-smi validate --run spatial-sharing-overhead
Validate Device           : [00c5:00:01.1]
    Platform              : NPU
    Power Mode            : Performance
-------------------------------------------------------------------------------
Test 1 [00c5:00:01.1]     : spatial-sharing-overhead
    Details               : **Overhead: 654.9 ms**
    Test Status           : [PASSED]
Z:\Repos\XRT-MCDM-FORK\XRT-MCDM\build\WRelease\xilinx\xrt>xrt-smi validate --run temporal-sharing-overhead
Validate Device           : [00c5:00:01.1]
    Platform              : NPU
    Power Mode            : Performance
-------------------------------------------------------------------------------
Test 1 [00c5:00:01.1]     : temporal-sharing-overhead
    Details               : **Overhead: '682.7' ms**
    Test Status           : [PASSED]

Documentation impact (if any)

N/A

@gbuildx
Copy link
Collaborator

gbuildx commented Dec 9, 2024

Can one of the admins verify this patch?

@mamin506
Copy link
Collaborator

The description is not accurate.
The problem was identified as missing init_buffer call which made the instruction buffer all zeros and causing the device to behave in-deterministically.

Just to clarify,
On Linux, missing init_buffer() will not made the instruction buffer all zeros. The true statement is that the instruction buffer will be garbage.
Instead, all zero instruction buffer is the behavior of Windows. It makes the test on Windows run as a no-op kernel. That is why the issue is not exposed on Windows.

@aktondak aktondak requested a review from mamin506 December 10, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants