Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDGEML-8777: Support firmware log buffer #328

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VinitAmd
Copy link

@VinitAmd VinitAmd commented Dec 16, 2024

[Why]
Log buffer support is required to enhance debugging support on NPU stack

[How]

  1. allocate log buffer and share with firmware via 'start event trace' message Config start_event_trace request param, send via mgmt channel. Handle the send buffer resp.
  2. receive notifications about log buffer fullness via interrupt from firmware. Handle channel interrupt process read FW lg buffer and update log metadata head/tail.

Signed-off-by: Vinit [email protected]

@AMDGithubSCIMAdmin
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@mamin506 mamin506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"__packed" is missing for the request and response structs.

src/driver/amdxdna/aie2_msg_priv.h Outdated Show resolved Hide resolved
src/driver/amdxdna/aie2_msg_priv.h Outdated Show resolved Hide resolved
src/driver/amdxdna/aie2_msg_priv.h Outdated Show resolved Hide resolved
@mamin506 mamin506 requested a review from vengutta18 December 16, 2024 19:51
@maxzhen maxzhen added the draft label Dec 19, 2024
@VinitAmd VinitAmd force-pushed the fw_log_buff_x branch 3 times, most recently from 5ac87bf to add9c42 Compare December 23, 2024 09:35
@VinitAmd VinitAmd force-pushed the fw_log_buff_x branch 3 times, most recently from a64f459 to bf466db Compare January 7, 2025 08:34
@VinitAmd VinitAmd marked this pull request as ready for review January 7, 2025 08:38
@vengutta18 vengutta18 removed the draft label Jan 7, 2025
@mamin506
Copy link
Contributor

mamin506 commented Jan 9, 2025

ok to test

@VinitAmd
Copy link
Author

VinitAmd commented Jan 9, 2025

=== 3 tests failed on Linux_npu_tests run.
ipu_suspend : aie2_state_write:NPU suspend \xf8,\x94\x84\xff\xff\xff\xff\xb8%n\x84\xff\xff\xff\xff\x08\x1c̅\xff\xff\xff\xff failed
ipu_resume : aie2_state_write: NPU resume \xff\xf8,\x94\x84\xff\xff\xff\xff(\xbcw\x8eW\xa8\xff\xff\x08\x1c̅\xff\xff\xff\xff failed
ipu_check_header_hash
Already ticket is raised for FW team FWDEV-103771 .

@mamin506
Copy link
Contributor

@VinitAmd , please fix the pipeline failure. The driver is not able to load. Please try your driver on a NPU1 machine. Please contact @vengutta18 for the machine if you don't have one.

Copy link
Contributor

@mamin506 mamin506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not expect any changes for amdxdna_mailbox* files for this feature.
Please cleanup the amdxdna_mailbox* files.

After this is fix, I will review other code.
Please follow https://github.com/amd/xdna-driver?tab=readme-ov-file#checkpatch to setup your workspace.
The coding style needs to pass checkpatch.pl test.

src/driver/amdxdna/amdxdna_mailbox_helper.h Outdated Show resolved Hide resolved
src/driver/amdxdna/amdxdna_mailbox.h Outdated Show resolved Hide resolved
src/driver/amdxdna/amdxdna_mailbox.c Outdated Show resolved Hide resolved
src/driver/amdxdna/amdxdna_mailbox.c Outdated Show resolved Hide resolved
@VinitAmd
Copy link
Author

I'm not expect any changes for amdxdna_mailbox* files for this feature. Please cleanup the amdxdna_mailbox* files.

After this is fix, I will review other code. Please follow https://github.com/amd/xdna-driver?tab=readme-ov-file#checkpatch to setup your workspace. The coding style needs to pass checkpatch.pl test.

Thanks, I was looking for this info

@VinitAmd VinitAmd force-pushed the fw_log_buff_x branch 5 times, most recently from 7d9850d to 2d0d0b4 Compare January 13, 2025 11:02
@VinitAmd
Copy link
Author

I'm not expect any changes for amdxdna_mailbox* files for this feature. Please cleanup the amdxdna_mailbox* files.
After this is fix, I will review other code. Please follow https://github.com/amd/xdna-driver?tab=readme-ov-file#checkpatch to setup your workspace. The coding style needs to pass checkpatch.pl test.

Thanks, I was looking for this info

Code formatting done with checkpatch.pl

@VinitAmd VinitAmd force-pushed the fw_log_buff_x branch 2 times, most recently from e77d462 to 8d6b7a1 Compare January 15, 2025 17:45
@mamin506
Copy link
Contributor

retest this please

@VinitAmd VinitAmd changed the title EDGEML-8777 - [Linux NPU Driver]: support firmware log buffer EDGEML-8777: Support firmware log buffer Jan 20, 2025
@VinitAmd
Copy link
Author

trigger jenkins

@mamin506
Copy link
Contributor

ok to test

@mamin506
Copy link
Contributor

retest this please

@amd-akshatah
Copy link

ok to test

@amd-akshatah
Copy link

retest this please

@sreedharamd
Copy link

ok to test

@amd-akshatah
Copy link

retest this please

@@ -263,6 +263,45 @@ static int aie2_dpm_level_get(struct seq_file *m, void *unused)

AIE2_DBGFS_FOPS(dpm_level, aie2_dpm_level_get, aie2_dpm_level_set);

//write debufs for event trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel style comments please.

@@ -250,11 +250,13 @@ struct amdxdna_dev_hdl {
u32 npuclk_freq;
u32 hclk_freq;
bool force_preempt_enabled;
uint32_t event_trace_enabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be bool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a option to enable event trace but disable log printing for debug purpose , so kept it uint32.
if we want all the time encode log too added to dmesg, it can be bool

#include "amdxdna_trace.h"
#include "amdxdna_mailbox.h"

uint8_t g_fwLogBuf[TRACE_EVENT_BUFFER_SIZE];
Copy link
Contributor

@vengutta18 vengutta18 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please allocate this using kzalloc and clean up when it's no longer needed. Also variable name to fw_log_buf


static void clear_event_trace_msix(struct amdxdna_dev_hdl *ndev)
{
// Clear the log buffer interrupt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments ditto, kernel style.

int aie2_start_event_trace(struct amdxdna_dev_hdl *ndev, dma_addr_t addr, u32 size);
void aie2_set_trace_timestamp(struct amdxdna_dev_hdl *ndev, uint32_t *resp);
void aie2_assign_event_trace_state(struct amdxdna_dev_hdl *ndev, uint32_t state);
int aie2_is_event_trace_supported_on_dev(struct amdxdna_dev_hdl *ndev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all the functions here, only keep the functions that are used by other files,rest of them make it static inside the aie2_event_trace.c


if (ret) {
XDNA_ERR(ndev->xdna, "Send start event trace failed, ret %d", ret);
// Currently this feature is supported on limited HW's,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line comment style

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any guide available for kernel comment style ? checkPatch.pl doesn't fix the problem , it only shows the problem.
if no warning/error while checkPatch.pl run hard to fix problem without guide.

return;
}

ndev->event_trace_enabled = state;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you assign the state once the trace is enabled successfully, It could fail at line no. 55!

req.event_trace_categories = 0xFFFFFFFF;
req.event_trace_timestamp = EVENT_TRACE_TIMESTAMP_FW_CHRONO;

XDNA_INFO(ndev->xdna, "send start event trace msg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it XDNA_DBG

{
DECLARE_AIE2_MSG(stop_event_trace, MSG_OP_STOP_EVENT_TRACE);

XDNA_INFO(ndev->xdna, "send stop event trace msg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


aie2_hw_stop(xdna);
aie2_error_async_events_free(ndev);
if (is_event_trace_supported)
aie2_event_trace_free(ndev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things here, are trace_alloc and trace_free are in sync?
what happens during the driver unload or suspend/resume?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. are trace_alloc and trace_free are in sync?---> yes both are in sync
  2. what happens during the driver unload ---> when driver unload , aie2_fini is called and which triggers stop event trace and free even trace resource.
  3. suspend/resume -> Currently there is FW issue when PG applied FW loosing event trace buffer information, causing crash in resume (FWDEV-103771

[Why]
  Log buffer support is required to enhance debugging support on NPU stack

[How]
  1. Allocate log buffer and share with firmware via 'start_event_trace' message
     Config start_event_trace request param, send via mgmt channel.
     Handle the send buffer resp.
  2. Receive interrupt about log buffer half fullness from firmware.
     Handle interrupt, process further buffer data and update buffer head_offset for FW.
  3. Add stop_event_trace_send api and it's handle to stop logging when aie2 shutdown.
  4. Add event_trace debugfs for dynamic control of logging.

Signed-off-by: vinit shukla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants