-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orchagent exiting due to unsupported SAI_*_ATTR_SELECTIVE_COUNTER_LIST attr #20725
Comments
Hi:
that attribute was recently added in 2024/10/7
this attribute is LIST of counters, and it returned some OID value 0x559ac8beda80 on which OT returned NULL, and it should be SAI_OBJECT_TYPE_COUNTER as specified in SAI headers saiport.h, if it's returning NULL, that's a vendor bug and currently syncd on such error crashes, since we got OID but we don't know what type it is, this is invalid so what should happened, when we have newer headers in SAI sairedis/syncd and we are using older vendor SAI, all unsupported attributes should return not implemented or not supported, instead of success, but it seams that vendor is returning some invalid value and success on this attribute error comes from here: https://github.com/sonic-net/sonic-sairedis/blob/master/syncd/SaiDiscovery.cpp#L237 error could be also caused by DASH extensions (if vendor support DASH) since extensions range changed, and that change is not backward compatible: opencomputeproject/SAI#2028 so what i expect is happening, vendor have some custom/private attribute after SAI_PORT_ATTR_END, on older version of SAI headers which have the same enum value as SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST, which causes syncd think that SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST is implemented when actually this is private internal attribute @bofish-arista can you confirm ? |
@JaiOCP - could you please confirm the analysis above? |
@rajkumar38 - fyi |
Hi @kcudnik - your assessment is correct. There are some extensions added by the vendor which are causing the conflict. The vendor implementation currently does not allow SAI to grow without introducting conflicts. We have a ticket opened with them to get this addressed. |
Nice, I have proposal to update syncd and SAI metadata to add version on each attribute so discovery process will not exceed SAI_PORT_ATTR_END and any attr end if vendor is lover version than supported syncd headers, this will solve this issue for the future, I will work on this to address this issue @lguohan for visibility |
We introduced custom attribute range for each attribute for vendors to prevent this behavior https://github.com/opencomputeproject/SAI/blob/master/inc/saiport.h#L2635 SAI_PORT_ATTR_CUSTOM_RANGE_START |
I created partial one step to partial solution opencomputeproject/SAI#2100, with this we will have attributes and SAI version per attribute, you can read explanation here: opencomputeproject/SAI#2099 i will work on this to support that on syncd with a cmd enable/flag |
By default disabled. If enabled, it will use SAI metadata and libsai version returned by sai_api_version_query to verify if given attribute during SAI discovery process was introduced in the current libsai version or below that version. This feature is partial solution to this problem: sonic-net/sonic-buildimage#20725, more here: opencomputeproject/SAI#2099
By default disabled. If enabled, it will use SAI metadata and libsai version returned by sai_api_version_query to verify if given attribute during SAI discovery process was introduced in the current libsai version or below that version. This feature is partial solution to this problem: sonic-net/sonic-buildimage#20725, more here: opencomputeproject/SAI#2099
Anyone tracking this issue, a simple workaround until a proper fix is created is here: bradh352/sonic-sairedis@9f2a0b9 |
on sairedis master this is already fixed partially: sonic-net/sonic-sairedis#1470, this is checked in, but needs to be enabled by "-a" parameter in syncd init script |
also entire problem is not that simple, since custom attributes on brcm will be moved from v1.13 and v1.14 to different range |
I saw that but I didn't see the commit to SAI that actually tagged each attribute with a version. That said, now that I'm looking harder at it, I see opencomputeproject/SAI#2100 which is actually auto-generating a version header based on git tags. Out of curiosity why make it opt-in? Are there expected incompatibilities? |
for backward compatibility |
This reverts commit a21cabf.
I can confirm that adding the "-a" does indeed work on my TD3 switches. |
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
nice ! |
can we enable tahat ? |
I think it would make sense for it to be enabled by default, that said I don't fully understand the backwards compatibility implications, perhaps there may be specific issues with various closed vendor blobs. In my own branch (which I only use on Broadcom Trident 3) I always enable However, I don't see why it couldn't be moved into each ASIC-specific section that is validated to be ok: |
can you make PR for sonic-sairedis to enable this feature by default ? i think there are no implications for not enabling that in production, from other hand, if there are new attributes that are not in official release then this needs to be disabled, but currently this feature is only implemented in discovery, user can still make invalid create/set/get of attrbitue that is vendor extention after ATTR_END, but in normal operation this will not happen, since release versions are tested before production but maybe it would make sense to also implement check on those operations too, in case some old libsai would be used in some weird case |
I've tried enabling -a flag in syncd init script and it is working for me. |
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
|
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
By default disabled. If enabled, it will use SAI metadata and libsai version returned by sai_api_version_query to verify if given attribute during SAI discovery process was introduced in the current libsai version or below that version. This feature is partial solution to this problem: sonic-net/sonic-buildimage#20725, more here: opencomputeproject/SAI#2099
By default disabled. If enabled, it will use SAI metadata and libsai version returned by sai_api_version_query to verify if given attribute during SAI discovery process was introduced in the current libsai version or below that version. This feature is partial solution to this problem: sonic-net/sonic-buildimage#20725, more here: opencomputeproject/SAI#2099
As per sonic-net/sonic-buildimage#20725 attribute validation must be enabled otherwise failures will be observed. Not sure why this isn't default.
Why I did it Upgrade SAI xgs version to 11.2.16.2 to fix sonic-net#20725 Work item tracking Microsoft ADO (number only): 30445459 How I did it Update xgs SAI version to 11.2.16.2 in the sai.mk file. How to verify it load the built image to the DUT, swss starts fine. Signed-off-by: zitingguo-ms <[email protected]>
Description
Sonic-buildimage PR#20540 has incorporated SAI changes which includes latest sairedis, this in turn features changes to SAI, including SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST, which does not appear to be supported yet by Broadcom.
As a result of this change, orchagent is exiting early in startup process.
Relevant links:
#20540
sonic-net/sonic-sairedis#1431
opencomputeproject/SAI#1941
Steps to reproduce the issue:
Issue is seen during inialization of a standalone device.
Describe the results you received:
Orchagent exited with runtime error logged as shown below:
2024 Nov 4 17:44:07.612509 up500 ERR syncd#syncd: :- run: Runtime error: :- discover: when query SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST (on SAI_OBJECT_TYPE_PORT RID oid:0x100000001) got value oid:0x7ffc4a7b45f0 objectTypeQuery returned NULL object type
Describe the results you expected:
Output of
show version
:root@up322:~# show version
SONiC Software Version: SONiC.branch.master-ars.7cd2518e-buildimage.origin.master-nightly-slim-2024.10.31.20.12
SONiC OS Version: 12
Distribution: Debian 12.7
Kernel: 6.1.0-22-2-amd64
Build commit: 7f44814d7
Build date: Fri Nov 1 00:59:18 UTC 2024
Built by: jenkins@jenkins-arsonic-k8s-1-vfqtx
Platform: x86_64-arista_7060_cx32s
HwSKU: Arista-7060CX-32S-C32
ASIC: broadcom
ASIC Count: 1
Serial Number: SGD20254417
Model Number: DCS-7060CX-32S
Hardware Revision: 03.00
Uptime: 06:20:23 up 9 min, 2 users, load average: 0.26, 0.98, 0.75
Date: Sat 02 Nov 2024 06:20:23
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: