Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing spdx license expression in license detection #4023

Conversation

alexzurbonsen
Copy link
Contributor

@alexzurbonsen alexzurbonsen commented Dec 16, 2024

Context

Occassionally, the spdx license expression is missing in license detections even though the license expression itself is non-null and a matching spdx expression would be available.

Should fix #4015

Summary

In the post processing step clues are converted to detections under certain conditions. This entails that their license expression may be updated if it was previously None. In these cases, we also need to update the spdx license expression.

It seems that this fix covers at least some instances of the bug, that I know of.

Test Plan

Run scancode on the example from #4015

scancode -l <path/to>/Amd.h  --json scancode_null_expression.json

The result:

"files": [
        {
            "path": "Amd.h",
            "type": "file",
            "detected_license_expression": "mpl-2.0",
            "detected_license_expression_spdx": "MPL-2.0",
            "license_detections": [
                {
                    "license_expression": "mpl-2.0",
                    "license_expression_spdx": "MPL-2.0",
                    "matches": [
                        {
                            "license_expression": "mpl-2.0",
                            "license_expression_spdx": "MPL-2.0",
                            "from_file": "Amd.h",
                            "start_line": 6,
                            "end_line": 8,
                            "matcher": "2-aho",
                            "score": 100.0,
                            "matched_length": 39,
                            "match_coverage": 100.0,
                            "rule_relevance": 100,
                            "rule_identifier": "mpl-2.0_3.RULE",
                            "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mpl-2.0_3.RULE"
                        }
                    ],
                    "identifier": "mpl_2_0-d0113b18-ff50-a2fd-17f7-9227082156a7"
                },
                {
                    "license_expression": "mpl-2.0",
                    "license_expression_spdx": "MPL-2.0",
                    "matches": [
                        {
                            "license_expression": "mpl-2.0",
                            "license_expression_spdx": "MPL-2.0",
                            "from_file": "Amd.h",
                            "start_line": 17,
                            "end_line": 18,
                            "matcher": "3-seq",
                            "score": 50.0,
                            "matched_length": 8,
                            "match_coverage": 50.0,
                            "rule_relevance": 100,
                            "rule_identifier": "mpl-2.0_97.RULE",
                            "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mpl-2.0_97.RULE"
                        }
                    ],
                    "identifier": "mpl_2_0-31c7f2f0-eb53-75b7-e843-a9ef4eb4ddf4"
                }
            ],
            "license_clues": [],
            "percentage_of_license_text": 2.27,
            "scan_errors": []
        }
    ]

scancode_null_expression.json

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Occassionally, the spdx license expression is missing in license detections
even though the license expression itself is non-null and a matching
spdx expression would be available. At least some instances of this
bug are due to post processing of detections, in which only the
license expression but not the spdx license expression is corrected for
license clues that are converted to license detections in post processing.

Signed-off-by: alexzurbonsen <[email protected]>
@alexzurbonsen alexzurbonsen marked this pull request as ready for review December 16, 2024 15:58
@alexzurbonsen
Copy link
Contributor Author

@AyanSinhaMahapatra Seems like this could fix #4015 ? It seems to work for the test file that I was looking at.

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks @alexzurbonsen for the fix.
This fixes the issue we were having in some cases of imperfect license detections.
I have also added a rule at dcabc07 to fix the detection issue. Thanks for reporting this and apologies for the very late reply.

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit 13b47a7 into aboutcode-org:develop Jan 14, 2025
38 checks passed
@alexzurbonsen
Copy link
Contributor Author

@AyanSinhaMahapatra No worries, thanks for merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected missing spdx expression in license detection
2 participants