You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the packets are going through the system and from host the cores are reset and instructions are loaded, sometimes core wrappers observe slot error, when they see an incoming packet with a slot number which they are already operating on. This might be requiring more timeouts for the cores to flush their packets, or the packets to go through the system and release the slot.
Current remedy is to reset the board with the pci_reset script and reload the board.
The text was updated successfully, but these errors were encountered:
It did happen once when board was just programmed, so it's not only warm reload. Few notes:
The number read by the perf binary is what the LB reports. More specifically what slot_keeper reports.
In current implementation, when a duplicate slot is received, an error flag is raised, but it is not available for host readback. That was the initial plan to get more insight.
Right now the slot_keeper will remember number of previous duplicates, and consider them to know when we are out of slots. So it will not reintroduce the redundant slot.
Duplicate slot can be ignored in the slot_keeper to be more reliable, but then the problem would get covered, so this implementation is for debugging purposes and after the issue is fixed can be simplified.
mkhazraee
changed the title
Extra slots when reloading warm
Extra slots received in LB, mostly when reloading warm
Jan 28, 2023
mkhazraee
changed the title
Extra slots received in LB, mostly when reloading warm
Very rarely a slot is returned to the LB which was supposed to be in LB and not the RPUs.
Jan 28, 2023
When the packets are going through the system and from host the cores are reset and instructions are loaded, sometimes core wrappers observe slot error, when they see an incoming packet with a slot number which they are already operating on. This might be requiring more timeouts for the cores to flush their packets, or the packets to go through the system and release the slot.
Current remedy is to reset the board with the pci_reset script and reload the board.
The text was updated successfully, but these errors were encountered: