Python: Model additional flow steps for the lxml framework #18185

joefarebrother · 2024-12-02T21:15:03Z

Models additional flow steps for Element and ElementTree objects from the lxml.etree module, to track taint through parsed XML elements.

Copilot reviewed 2 out of 5 changed files in this pull request and generated no suggestions.

Files not reviewed (3)

python/ql/lib/semmle/python/frameworks/Lxml.qll: Language not supported
python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.expected: Language not supported
python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.ql: Language not supported

Comments skipped due to low confidence (16)

python/ql/test/library-tests/frameworks/lxml/taint_test.py:50

The taint tracking is missing through the call to list. Ensure the test case covers this scenario to verify taint propagation.

list(elem.findall("b"))[0].text,  # $ MISSING:tainted # Type tracking is not followed through call to `list`

python/ql/test/library-tests/frameworks/lxml/taint_test.py:54

The taint tracking is missing through the call to next on elem.getiterator(). Ensure the test case covers this scenario to verify taint propagation.

next(elem.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:62

The taint tracking is missing through the call to next on elem.iter(). Ensure the test case covers this scenario to verify taint propagation.

next(elem.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:64

The taint tracking is missing through the call to next on elem[0].iterancestors(). Ensure the test case covers this scenario to verify taint propagation.

next(elem[0].iterancestors()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:66

The taint tracking is missing through the call to next on elem.iterchildren(). Ensure the test case covers this scenario to verify taint propagation.

next(elem.iterchildren()).text, # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:68

The taint tracking is missing through the call to next on elem.iterdescendants(). Ensure the test case covers this scenario to verify taint propagation.

next(elem.iterdescendants()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:70

The taint tracking is missing through the call to next on elem.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.

next(elem.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:72

The taint tracking is missing through the call to next on elem[0].itersiblings(). Ensure the test case covers this scenario to verify taint propagation.

next(elem[0].itersiblings()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:82

The taint tracking is missing for ch.text in the for loop. Ensure the test case covers this scenario to verify taint propagation.

ch.text  # $ MISSING: tainted # API node getASubscript() appears to not consider things like for loops

python/ql/test/library-tests/frameworks/lxml/taint_test.py:93

The taint tracking is missing through the call to next on tree.getiterator(). Ensure the test case covers this scenario to verify taint propagation.

next(tree.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:95

The taint tracking is missing through the call to next on tree.iter(). Ensure the test case covers this scenario to verify taint propagation.

next(tree.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:97

The taint tracking is missing through the call to next on tree.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.

next(tree.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:109

The taint tracking is missing for elem2.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.

elem2.text,  # $ MISSING:tainted # Type is not tracked to the tuple return value

python/ql/test/library-tests/frameworks/lxml/taint_test.py:110

The taint tracking is missing for elem3.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.

elem3.text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:119

The taint tracking is missing for tree2.getroot(). Ensure the test case covers this scenario to verify taint propagation.

tree2.getroot() # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:126

The taint tracking is missing for el.text in the iterparse loop. Ensure the test case covers this scenario to verify taint propagation.

el.text, # $ MISSING:tainted

Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more

yoff

Nice modeling, and great use of the API graph. I like the tests also, especially the ones about what type tracking can and cannot do. I think we should document this more clearly. Currently, there are some comments to that effect in the type tracking tests.

python/ql/lib/semmle/python/frameworks/Lxml.qll

yoff · 2025-01-06T14:58:14Z

python/ql/lib/semmle/python/frameworks/Lxml.qll

+    override string getFormat() { result = "XML" }
+  }
+  // TODO: ElementTree.write can write to a file-like object; should that be a flow step?
+  // It also can accept a filepath which could be a path injection sink.


Perhaps we can add that as a sink directly by extending the right concept? (Probably out of scope for this PR)

python/ql/lib/semmle/python/frameworks/Lxml.qll

yoff · 2025-01-06T15:00:56Z

python/ql/test/library-tests/frameworks/lxml/taint_test.py

+        elem.get("at"),  # $ tainted
+        elem.getchildren()[0].text,  # $ tainted
+        elem.getiterator(),  # $ tainted
+        next(elem.getiterator()).text,  # $ MISSING:tainted


next is currently modeled as a flow summary, but not a "simple" one, so we cannot type track through it (it uses type tracking to identify the calls).

Is there anything we could expect to be able to type track through these cases of methods returning iterators?
(list(X)[0] doesn't work; and neither do for loops as one of the other tests here shows)

yoff · 2025-01-06T15:02:57Z

python/ql/test/library-tests/frameworks/lxml/taint_test.py

+
+    ensure_tainted(
+        func2(tree), # $ tainted
+        func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls


This is surprising, should that be func2(tree).getroot().text?

It should; fixed. However the flow is still missing.

yoff · 2025-01-06T15:04:54Z

python/ql/test/library-tests/frameworks/lxml/taint_test.py

+        func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls
+        func3(tree).text, # $ MISSING:tainted - this includes if there is a type hint annotation on the return
+        typing.cast(ET.ElementTree, tree), # $ tainted
+        typing.cast(ET.ElementTree, tree).text, # $ MISSING:tainted - this includes for flow summary models


It will only work for "simple" summaries; those that do not use type tracking to identify calls. (And, again, is .text the right test here? I do not see tree.text being tainted.)

Co-authored-by: yoff <[email protected]>

github-actions bot added the Python label Dec 2, 2024

joefarebrother added 5 commits December 10, 2024 19:01

Model flow steps for lxml

89167da

Add missing qldoc

d2b0d7a

Added tests

d2ed92d

Improve tests and use API graphs

29a9023

Update test output

bcb08bb

joefarebrother force-pushed the python-lxml branch 2 times, most recently from 652a9ca to b523be2 Compare December 11, 2024 12:03

Add missing qldoc and revert accidentilly commited threat model change

5c8ef28

joefarebrother force-pushed the python-lxml branch from b523be2 to 5c8ef28 Compare December 11, 2024 12:04

joefarebrother added 2 commits December 11, 2024 12:25

Qldoc improvements + add a few extra tests

2019ddf

Add change note

e6794a9

github-actions bot added the documentation label Dec 11, 2024

joefarebrother marked this pull request as ready for review December 11, 2024 14:29

Copilot bot review requested due to automatic review settings December 11, 2024 14:29

joefarebrother requested a review from a team as a code owner December 11, 2024 14:29

joefarebrother changed the title ~~[Draft] Python: Model additional flow steps for the lxml framework~~ Python: Model additional flow steps for the lxml framework Dec 11, 2024

Copilot AI reviewed Dec 11, 2024

View reviewed changes

Add additional tests demonstrating false negative flow

dcbcf7e

yoff previously approved these changes Jan 6, 2025

View reviewed changes

Apply suggestions from code review - update doc comments

8b174ea

Co-authored-by: yoff <[email protected]>

joefarebrother dismissed yoff’s stale review via 8b174ea January 7, 2025 15:21

Fix tests to check for the correct type

35961e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Model additional flow steps for the lxml framework #18185

Python: Model additional flow steps for the lxml framework #18185

joefarebrother commented Dec 2, 2024

yoff left a comment

yoff Jan 6, 2025

yoff Jan 6, 2025

joefarebrother Jan 8, 2025

yoff Jan 6, 2025

joefarebrother Jan 7, 2025

yoff Jan 6, 2025

Python: Model additional flow steps for the lxml framework #18185

Are you sure you want to change the base?

Python: Model additional flow steps for the lxml framework #18185

Conversation

joefarebrother commented Dec 2, 2024

Choose a reason for hiding this comment

yoff left a comment

Choose a reason for hiding this comment

yoff Jan 6, 2025

Choose a reason for hiding this comment

yoff Jan 6, 2025

Choose a reason for hiding this comment

joefarebrother Jan 8, 2025

Choose a reason for hiding this comment

yoff Jan 6, 2025

Choose a reason for hiding this comment

joefarebrother Jan 7, 2025

Choose a reason for hiding this comment

yoff Jan 6, 2025

Choose a reason for hiding this comment