[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

agatah2333 · 2025-01-24T03:51:17Z

Reproduction steps

Description

When processing an InvokeCommandRequest with a non-existent endpoint (0x34), the code incorrectly validates the endpoint's existence and crashes with a VerifyOrDie assertion.

The payload that causes the issue is as follows:

1528002801360215370024003424013424020018350120010018181824Ff0B18

Decoded Log

The decoded log of the issue is as follows:

Handling via exchange: 7080r, Delegate: 0x55fedcdfa0c8
[1737687875.732275][829982:829982] CHIP:DMG: InvokeRequestMessage =
[1737687875.732284][829982:829982] CHIP:DMG: {
[1737687875.732290][829982:829982] CHIP:DMG:    suppressResponse = false, 
[1737687875.732298][829982:829982] CHIP:DMG:    timedRequest = false, 
[1737687875.732304][829982:829982] CHIP:DMG:    InvokeRequests =
[1737687875.732317][829982:829982] CHIP:DMG:    [
[1737687875.732323][829982:829982] CHIP:DMG:            CommandDataIB =
[1737687875.732331][829982:829982] CHIP:DMG:            {
[1737687875.732339][829982:829982] CHIP:DMG:                    CommandPathIB =
[1737687875.732352][829982:829982] CHIP:DMG:                    {
[1737687875.732363][829982:829982] CHIP:DMG:                            EndpointId = 0x34,
[1737687875.732373][829982:829982] CHIP:DMG:                            ClusterId = 0x34,
[1737687875.732382][829982:829982] CHIP:DMG:                            CommandId = 0x0,
[1737687875.732395][829982:829982] CHIP:DMG:                    },
[1737687875.732407][829982:829982] CHIP:DMG: 
[1737687875.732415][829982:829982] CHIP:DMG:                    CommandFields = 
[1737687875.732426][829982:829982] CHIP:DMG:                    {
[1737687875.732437][829982:829982] CHIP:DMG:                            0x0 = 0 (unsigned), 
[1737687875.732447][829982:829982] CHIP:DMG:                    },
[1737687875.732456][829982:829982] CHIP:DMG:            },
[1737687875.732469][829982:829982] CHIP:DMG: 
[1737687875.732477][829982:829982] CHIP:DMG:    ],
[1737687875.732493][829982:829982] CHIP:DMG: 
[1737687875.732502][829982:829982] CHIP:DMG:    InteractionModelRevision = 11
[1737687875.732510][829982:829982] CHIP:DMG: },

Expected Behavior

After sending the message, the expected behavior is that the system should return an "Endpoint Not Found" error. For instance, the command: ./chip-tool descriptor read server-list 0x654324 0x34 correctly returns UNSUPPORTED_ENDPOINT.

Actual Behavior
The all-cluster-app instead returns: status = 0x00 (SUCCESS)

When fuzzing the all-cluster-app, it crashes with a SIGABRT signal and displays the following error:

CHIP:SPT: VerifyOrDie failure at src/app/InteractionModelEngine.cpp:1702: ServerClusterCommandExists(aCommandPath) == Protocols::InteractionModel::Status::Success

reproduce(After 1.4):

./chip-tool any command-by-id 0x34 0x0 '{"0":0}' 0x654324 0x34

Environment
CHIP SDK Version: After 1.4
Application: all-cluster-app

Root Cause(Not Sure)

For an non-existent value like endpoint=0x34:

emberAfEndpointIndexIsEnabled only checks if index is less than 3, considering it valid if index<3
then emberAfContainsServerFromIndex directly returns true, without any validation of whether the endpoint and cluster exist

Bug prevalence

each time

GitHub hash of the SDK that was being used

57ff3e6

Platform

core

Platform Version(s)

1.4

Anything else?

No response

The text was updated successfully, but these errors were encountered:

bzbarsky-apple · 2025-01-25T05:37:09Z

emberAfEndpointIndexIsEnabled only checks if index is less than 3, considering it valid if index<3

That's a test-only implementation, not the one used by real code...

then emberAfContainsServerFromIndex directly returns true, without any validation of whether the endpoint and cluster exist

That is also a test-only implementation.

On tip I can't reproduce the crash (because that verification code was removed), but what I can reproduce is that the server is incorrectly responding with SUCCESS for cluster 0x34 on a non-existent endpoint. If I use cluster 0x06, it correctly responds with UNSUPPORTED_ENDPOINT.

Looks like the issue is only there for clusters implemented via CommandHandlerInterface, perhaps?

Ah, yes, CodegenDataModelProvider::AcceptedCommands never checks that the endpoint is valid when there's a CHI registed for a wildcard endpoint. It needs to check for endpoint validity no matter what. It also needs to check that the cluster is actually present on the endpoint, not just that the endpoint exists.

As in, this bit:

    const EmberAfCluster * serverCluster = FindServerCluster(path);
    VerifyOrReturnError(serverCluster != nullptr, CHIP_ERROR_NOT_FOUND);

needs to be at the very top of the method. Same thing for CodegenDataModelProvider::GeneratedCommands, probably.

agatah2333 · 2025-01-26T21:36:56Z

Thanks for your reply! The root cause analysis I provided may be insufficient and inaccurate and I marked not sure. I only roughly reviewed the code and analysis, but I downloaded the v1.4 branch code (commit: 57ff3e6) and compiled the all-cluster-app in 1.4 version. It incorrectly responded with SUCCESS for cluster 0x34 on a nonexistent endpoint. I searched the v1.4 branch code with CodegenDataModelProvider related file and found that the tested code does not exist, yet the problem still occurs.

There might be a logic issue when deal with command message.
When a command is sent to a nonexistent endpoint with a nonexistent cluster ID, the system should first verify whether the endpoint ID exists. In the newer version (v1.4 branch), they do not verify whether the endpoint ID and cluster ID exist.

andy31415 · 2025-01-27T14:06:57Z

Looking at master with the script:

./chip-tool pairing onnetwork 1 20202021
./chip-tool any command-by-id 0x34 0x0 '{"0": 0}' 1 0x34

I do indeed see kSuccess for what it thinks is a SoftwareDiagnostics::ResetWatermarks. I am unsure about the fuzzing bit.

 	"payload" : 
 	{
 		"decoded" : 
 		{
 			"invoke_response" : 
 			{
 				"interaction_model_revison" : "11",
 				"invoke_responses" : 
 				{
 					"Anonymous<0>" : 
 					{
 						"status" : 
 						{
 							"path" : 
 							{
 								"cluster_id" : "52 == 'SoftwareDiagnostics'",
 								"command_id" : "0 == 'ResetWatermarks'",
 								"endpoint_id" : "52"
 							},
 							"status" : 
 							{
 								"status" : "0 == kSuccess"
 							}
 						}
 					}
 				},
 				"suppress_response" : "false"
 			}
 		},

andy31415 · 2025-01-27T14:12:08Z

What is happening is that SoftwareDiagnostics is on ALL endpoints: https://github.com/project-chip/connectedhomeip/blob/master/src/app/clusters/software-diagnostics-server/software-diagnostics-server.cpp#L61

So when CodegenDataModelProvider::Invoke searches for the corresponding cluster, it always finds one regardless of endpoint and just forwards the request. So it behaves as if endpoint does not matter on this command, it is just a singleton so same method is called either way.

Will fix it so that we will check for endpoint and cluster being valid.

andy31415 · 2025-01-27T14:18:26Z

Read and write attribute are fine because they attempt to find the metadata for the attribute first.

andy31415 · 2025-01-27T15:19:41Z

I created a fix, however @agatah2333 I do not see any crash. The underlying cause was that a wildcard endpoint was accepted for the CommandHandlerInterface, however processing was still ok.

I imagine maybe a CHI that relies on valid endpoints would somehow crash... but unsure. Or maybe the latest (non 1.4) code is more resilient.

andy31415 · 2025-01-27T15:25:39Z

Crash may be in 1.4 because it still had DataModelProvider as optional. The code looks like:

    auto provider = GetDataModelProvider();
    if (provider->GetAcceptedCommandInfo(aCommandPath).has_value())
    {
#if CHIP_CONFIG_USE_EMBER_DATA_MODEL
        VerifyOrDie(ServerClusterCommandExists(aCommandPath) == Protocols::InteractionModel::Status::Success);
#endif
        return Protocols::InteractionModel::Status::Success;
    }

That looks to align ember logic with datamodel::Provider logic, however these do NOT align (CHI finds the command however ember knows that this is not a valid path). the 1.4 fix probably has to be different.

agatah2333 added bug Something isn't working needs triage labels Jan 24, 2025

bzbarsky-apple assigned andy31415 Jan 25, 2025

andy31415 linked a pull request Jan 27, 2025 that will close this issue

Validate for a valid cluster path when Invoke is called #37207

Open

andy31415 mentioned this issue Jan 27, 2025

[1.4 branch] Fix command handling logic for invalid endpoint commands. #37208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

agatah2333 commented Jan 24, 2025 •

edited

Loading

bzbarsky-apple commented Jan 25, 2025 •

edited

Loading

agatah2333 commented Jan 26, 2025 •

edited

Loading

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

Comments

agatah2333 commented Jan 24, 2025 • edited Loading

Reproduction steps

Bug prevalence

GitHub hash of the SDK that was being used

Platform

Platform Version(s)

Anything else?

bzbarsky-apple commented Jan 25, 2025 • edited Loading

agatah2333 commented Jan 26, 2025 • edited Loading

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

andy31415 commented Jan 27, 2025

agatah2333 commented Jan 24, 2025 •

edited

Loading

bzbarsky-apple commented Jan 25, 2025 •

edited

Loading

agatah2333 commented Jan 26, 2025 •

edited

Loading