Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Model additional flow steps for the lxml framework #18185

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

joefarebrother
Copy link
Contributor

Models additional flow steps for Element and ElementTree objects from the lxml.etree module, to track taint through parsed XML elements.

@github-actions github-actions bot added the Python label Dec 2, 2024
@joefarebrother joefarebrother force-pushed the python-lxml branch 2 times, most recently from 652a9ca to b523be2 Compare December 11, 2024 12:03
@joefarebrother joefarebrother marked this pull request as ready for review December 11, 2024 14:29
@Copilot Copilot bot review requested due to automatic review settings December 11, 2024 14:29
@joefarebrother joefarebrother requested a review from a team as a code owner December 11, 2024 14:29
@joefarebrother joefarebrother changed the title [Draft] Python: Model additional flow steps for the lxml framework Python: Model additional flow steps for the lxml framework Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 5 changed files in this pull request and generated no suggestions.

Files not reviewed (3)
  • python/ql/lib/semmle/python/frameworks/Lxml.qll: Language not supported
  • python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.expected: Language not supported
  • python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.ql: Language not supported
Comments skipped due to low confidence (16)

python/ql/test/library-tests/frameworks/lxml/taint_test.py:50

  • The taint tracking is missing through the call to list. Ensure the test case covers this scenario to verify taint propagation.
list(elem.findall("b"))[0].text,  # $ MISSING:tainted # Type tracking is not followed through call to `list`

python/ql/test/library-tests/frameworks/lxml/taint_test.py:54

  • The taint tracking is missing through the call to next on elem.getiterator(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:62

  • The taint tracking is missing through the call to next on elem.iter(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:64

  • The taint tracking is missing through the call to next on elem[0].iterancestors(). Ensure the test case covers this scenario to verify taint propagation.
next(elem[0].iterancestors()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:66

  • The taint tracking is missing through the call to next on elem.iterchildren(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterchildren()).text, # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:68

  • The taint tracking is missing through the call to next on elem.iterdescendants(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterdescendants()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:70

  • The taint tracking is missing through the call to next on elem.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:72

  • The taint tracking is missing through the call to next on elem[0].itersiblings(). Ensure the test case covers this scenario to verify taint propagation.
next(elem[0].itersiblings()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:82

  • The taint tracking is missing for ch.text in the for loop. Ensure the test case covers this scenario to verify taint propagation.
ch.text  # $ MISSING: tainted # API node getASubscript() appears to not consider things like for loops

python/ql/test/library-tests/frameworks/lxml/taint_test.py:93

  • The taint tracking is missing through the call to next on tree.getiterator(). Ensure the test case covers this scenario to verify taint propagation.
next(tree.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:95

  • The taint tracking is missing through the call to next on tree.iter(). Ensure the test case covers this scenario to verify taint propagation.
next(tree.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:97

  • The taint tracking is missing through the call to next on tree.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.
next(tree.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:109

  • The taint tracking is missing for elem2.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.
elem2.text,  # $ MISSING:tainted # Type is not tracked to the tuple return value

python/ql/test/library-tests/frameworks/lxml/taint_test.py:110

  • The taint tracking is missing for elem3.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.
elem3.text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:119

  • The taint tracking is missing for tree2.getroot(). Ensure the test case covers this scenario to verify taint propagation.
tree2.getroot() # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:126

  • The taint tracking is missing for el.text in the iterparse loop. Ensure the test case covers this scenario to verify taint propagation.
el.text, # $ MISSING:tainted

Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more

yoff
yoff previously approved these changes Jan 6, 2025
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice modeling, and great use of the API graph. I like the tests also, especially the ones about what type tracking can and cannot do. I think we should document this more clearly. Currently, there are some comments to that effect in the type tracking tests.

python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
override string getFormat() { result = "XML" }
}
// TODO: ElementTree.write can write to a file-like object; should that be a flow step?
// It also can accept a filepath which could be a path injection sink.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can add that as a sink directly by extending the right concept? (Probably out of scope for this PR)

python/ql/lib/semmle/python/frameworks/Lxml.qll Outdated Show resolved Hide resolved
elem.get("at"), # $ tainted
elem.getchildren()[0].text, # $ tainted
elem.getiterator(), # $ tainted
next(elem.getiterator()).text, # $ MISSING:tainted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next is currently modeled as a flow summary, but not a "simple" one, so we cannot type track through it (it uses type tracking to identify the calls).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we could expect to be able to type track through these cases of methods returning iterators?
(list(X)[0] doesn't work; and neither do for loops as one of the other tests here shows)


ensure_tainted(
func2(tree), # $ tainted
func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising, should that be func2(tree).getroot().text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should; fixed. However the flow is still missing.

func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls
func3(tree).text, # $ MISSING:tainted - this includes if there is a type hint annotation on the return
typing.cast(ET.ElementTree, tree), # $ tainted
typing.cast(ET.ElementTree, tree).text, # $ MISSING:tainted - this includes for flow summary models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only work for "simple" summaries; those that do not use type tracking to identify calls. (And, again, is .text the right test here? I do not see tree.text being tainted.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants