Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very rarely a slot is returned to the LB which was supposed to be in LB and not the RPUs. #2

Open
mkhazraee opened this issue Nov 15, 2022 · 1 comment
Labels
bug Something isn't working enhancement New feature or request

Comments

@mkhazraee
Copy link
Collaborator

When the packets are going through the system and from host the cores are reset and instructions are loaded, sometimes core wrappers observe slot error, when they see an incoming packet with a slot number which they are already operating on. This might be requiring more timeouts for the cores to flush their packets, or the packets to go through the system and release the slot.

Current remedy is to reset the board with the pci_reset script and reload the board.

@mkhazraee mkhazraee added bug Something isn't working enhancement New feature or request labels Nov 15, 2022
@mkhazraee
Copy link
Collaborator Author

It did happen once when board was just programmed, so it's not only warm reload. Few notes:

  • The number read by the perf binary is what the LB reports. More specifically what slot_keeper reports.
  • In current implementation, when a duplicate slot is received, an error flag is raised, but it is not available for host readback. That was the initial plan to get more insight.
  • Right now the slot_keeper will remember number of previous duplicates, and consider them to know when we are out of slots. So it will not reintroduce the redundant slot.
  • Duplicate slot can be ignored in the slot_keeper to be more reliable, but then the problem would get covered, so this implementation is for debugging purposes and after the issue is fixed can be simplified.

@mkhazraee mkhazraee changed the title Extra slots when reloading warm Extra slots received in LB, mostly when reloading warm Jan 28, 2023
@mkhazraee mkhazraee changed the title Extra slots received in LB, mostly when reloading warm Very rarely a slot is returned to the LB which was supposed to be in LB and not the RPUs. Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant