Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Crash in ProcessCommandDataIB due to incorrect endpoint validation and VerifyOrDie assertion #37184

Open
agatah2333 opened this issue Jan 24, 2025 · 7 comments · May be fixed by #37207
Open
Assignees
Labels
bug Something isn't working needs triage

Comments

@agatah2333
Copy link

agatah2333 commented Jan 24, 2025

Reproduction steps

Description

When processing an InvokeCommandRequest with a non-existent endpoint (0x34), the code incorrectly validates the endpoint's existence and crashes with a VerifyOrDie assertion.

The payload that causes the issue is as follows:

1528002801360215370024003424013424020018350120010018181824Ff0B18 

Decoded Log

The decoded log of the issue is as follows:

Handling via exchange: 7080r, Delegate: 0x55fedcdfa0c8
[1737687875.732275][829982:829982] CHIP:DMG: InvokeRequestMessage =
[1737687875.732284][829982:829982] CHIP:DMG: {
[1737687875.732290][829982:829982] CHIP:DMG:    suppressResponse = false, 
[1737687875.732298][829982:829982] CHIP:DMG:    timedRequest = false, 
[1737687875.732304][829982:829982] CHIP:DMG:    InvokeRequests =
[1737687875.732317][829982:829982] CHIP:DMG:    [
[1737687875.732323][829982:829982] CHIP:DMG:            CommandDataIB =
[1737687875.732331][829982:829982] CHIP:DMG:            {
[1737687875.732339][829982:829982] CHIP:DMG:                    CommandPathIB =
[1737687875.732352][829982:829982] CHIP:DMG:                    {
[1737687875.732363][829982:829982] CHIP:DMG:                            EndpointId = 0x34,
[1737687875.732373][829982:829982] CHIP:DMG:                            ClusterId = 0x34,
[1737687875.732382][829982:829982] CHIP:DMG:                            CommandId = 0x0,
[1737687875.732395][829982:829982] CHIP:DMG:                    },
[1737687875.732407][829982:829982] CHIP:DMG: 
[1737687875.732415][829982:829982] CHIP:DMG:                    CommandFields = 
[1737687875.732426][829982:829982] CHIP:DMG:                    {
[1737687875.732437][829982:829982] CHIP:DMG:                            0x0 = 0 (unsigned), 
[1737687875.732447][829982:829982] CHIP:DMG:                    },
[1737687875.732456][829982:829982] CHIP:DMG:            },
[1737687875.732469][829982:829982] CHIP:DMG: 
[1737687875.732477][829982:829982] CHIP:DMG:    ],
[1737687875.732493][829982:829982] CHIP:DMG: 
[1737687875.732502][829982:829982] CHIP:DMG:    InteractionModelRevision = 11
[1737687875.732510][829982:829982] CHIP:DMG: },

Expected Behavior

After sending the message, the expected behavior is that the system should return an "Endpoint Not Found" error. For instance, the command: ./chip-tool descriptor read server-list 0x654324 0x34 correctly returns UNSUPPORTED_ENDPOINT.

Actual Behavior
The all-cluster-app instead returns: status = 0x00 (SUCCESS)

When fuzzing the all-cluster-app, it crashes with a SIGABRT signal and displays the following error:

CHIP:SPT: VerifyOrDie failure at src/app/InteractionModelEngine.cpp:1702: ServerClusterCommandExists(aCommandPath) == Protocols::InteractionModel::Status::Success

Image

reproduce(After 1.4):

./chip-tool any command-by-id 0x34 0x0 '{"0":0}' 0x654324 0x34

Environment
CHIP SDK Version: After 1.4
Application: all-cluster-app

Root Cause(Not Sure)

For an non-existent value like endpoint=0x34:

emberAfEndpointIndexIsEnabled only checks if index is less than 3, considering it valid if index<3
then emberAfContainsServerFromIndex directly returns true, without any validation of whether the endpoint and cluster exist

Bug prevalence

each time

GitHub hash of the SDK that was being used

57ff3e6

Platform

core

Platform Version(s)

1.4

Anything else?

No response

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Jan 25, 2025

emberAfEndpointIndexIsEnabled only checks if index is less than 3, considering it valid if index<3

That's a test-only implementation, not the one used by real code...

then emberAfContainsServerFromIndex directly returns true, without any validation of whether the endpoint and cluster exist

That is also a test-only implementation.

On tip I can't reproduce the crash (because that verification code was removed), but what I can reproduce is that the server is incorrectly responding with SUCCESS for cluster 0x34 on a non-existent endpoint. If I use cluster 0x06, it correctly responds with UNSUPPORTED_ENDPOINT.

Looks like the issue is only there for clusters implemented via CommandHandlerInterface, perhaps?

Ah, yes, CodegenDataModelProvider::AcceptedCommands never checks that the endpoint is valid when there's a CHI registed for a wildcard endpoint. It needs to check for endpoint validity no matter what. It also needs to check that the cluster is actually present on the endpoint, not just that the endpoint exists.

As in, this bit:

    const EmberAfCluster * serverCluster = FindServerCluster(path);
    VerifyOrReturnError(serverCluster != nullptr, CHIP_ERROR_NOT_FOUND);

needs to be at the very top of the method. Same thing for CodegenDataModelProvider::GeneratedCommands, probably.

@agatah2333
Copy link
Author

agatah2333 commented Jan 26, 2025

Thanks for your reply! The root cause analysis I provided may be insufficient and inaccurate and I marked not sure. I only roughly reviewed the code and analysis, but I downloaded the v1.4 branch code (commit: 57ff3e6) and compiled the all-cluster-app in 1.4 version. It incorrectly responded with SUCCESS for cluster 0x34 on a nonexistent endpoint. I searched the v1.4 branch code with CodegenDataModelProvider related file and found that the tested code does not exist, yet the problem still occurs.

Image

There might be a logic issue when deal with command message.
When a command is sent to a nonexistent endpoint with a nonexistent cluster ID, the system should first verify whether the endpoint ID exists. In the newer version (v1.4 branch), they do not verify whether the endpoint ID and cluster ID exist.

@andy31415
Copy link
Contributor

Looking at master with the script:

./chip-tool pairing onnetwork 1 20202021
./chip-tool any command-by-id 0x34 0x0 '{"0": 0}' 1 0x34

I do indeed see kSuccess for what it thinks is a SoftwareDiagnostics::ResetWatermarks. I am unsure about the fuzzing bit.

 	"payload" : 
 	{
 		"decoded" : 
 		{
 			"invoke_response" : 
 			{
 				"interaction_model_revison" : "11",
 				"invoke_responses" : 
 				{
 					"Anonymous<0>" : 
 					{
 						"status" : 
 						{
 							"path" : 
 							{
 								"cluster_id" : "52 == 'SoftwareDiagnostics'",
 								"command_id" : "0 == 'ResetWatermarks'",
 								"endpoint_id" : "52"
 							},
 							"status" : 
 							{
 								"status" : "0 == kSuccess"
 							}
 						}
 					}
 				},
 				"suppress_response" : "false"
 			}
 		},

@andy31415
Copy link
Contributor

What is happening is that SoftwareDiagnostics is on ALL endpoints: https://github.com/project-chip/connectedhomeip/blob/master/src/app/clusters/software-diagnostics-server/software-diagnostics-server.cpp#L61

So when CodegenDataModelProvider::Invoke searches for the corresponding cluster, it always finds one regardless of endpoint and just forwards the request. So it behaves as if endpoint does not matter on this command, it is just a singleton so same method is called either way.

Will fix it so that we will check for endpoint and cluster being valid.

@andy31415
Copy link
Contributor

Read and write attribute are fine because they attempt to find the metadata for the attribute first.

@andy31415
Copy link
Contributor

I created a fix, however @agatah2333 I do not see any crash. The underlying cause was that a wildcard endpoint was accepted for the CommandHandlerInterface, however processing was still ok.

I imagine maybe a CHI that relies on valid endpoints would somehow crash... but unsure. Or maybe the latest (non 1.4) code is more resilient.

@andy31415
Copy link
Contributor

Crash may be in 1.4 because it still had DataModelProvider as optional. The code looks like:

    auto provider = GetDataModelProvider();
    if (provider->GetAcceptedCommandInfo(aCommandPath).has_value())
    {
#if CHIP_CONFIG_USE_EMBER_DATA_MODEL
        VerifyOrDie(ServerClusterCommandExists(aCommandPath) == Protocols::InteractionModel::Status::Success);
#endif
        return Protocols::InteractionModel::Status::Success;
    }

That looks to align ember logic with datamodel::Provider logic, however these do NOT align (CHI finds the command however ember knows that this is not a valid path). the 1.4 fix probably has to be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants