-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Model additional flow steps for the lxml framework #18185
Open
joefarebrother
wants to merge
11
commits into
github:main
Choose a base branch
from
joefarebrother:python-lxml
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+356
−13
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
89167da
Model flow steps for lxml
joefarebrother d2b0d7a
Add missing qldoc
joefarebrother d2ed92d
Added tests
joefarebrother 29a9023
Improve tests and use API graphs
joefarebrother bcb08bb
Update test output
joefarebrother 5c8ef28
Add missing qldoc and revert accidentilly commited threat model change
joefarebrother 2019ddf
Qldoc improvements + add a few extra tests
joefarebrother e6794a9
Add change note
joefarebrother dcbcf7e
Add additional tests demonstrating false negative flow
joefarebrother 8b174ea
Apply suggestions from code review - update doc comments
joefarebrother 35961e4
Fix tests to check for the correct type
joefarebrother File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
category: minorAnalysis | ||
--- | ||
- Additional taint steps through methods of `lxml.etree.Element` and `lxml.etree.ElementTree` objects from the `lxml` PyPI package have been modeled. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
|
||
private import python | ||
private import semmle.python.dataflow.new.DataFlow | ||
private import semmle.python.dataflow.new.TaintTracking | ||
private import semmle.python.Concepts | ||
private import semmle.python.ApiGraphs | ||
private import semmle.python.frameworks.data.ModelsAsData | ||
|
@@ -68,16 +69,9 @@ module Lxml { | |
*/ | ||
class XPathCall extends XML::XPathExecution::Range, DataFlow::CallCfgNode { | ||
XPathCall() { | ||
exists(API::Node parseResult | | ||
parseResult = | ||
etreeRef().getMember(["parse", "fromstring", "fromstringlist", "HTML", "XML"]).getReturn() | ||
or | ||
// TODO: lxml.etree.parseid(<text>)[0] will contain the root element from parsing <text> | ||
// but we don't really have a way to model that nicely. | ||
parseResult = etreeRef().getMember("XMLParser").getReturn().getMember("close").getReturn() | ||
| | ||
this = parseResult.getMember("xpath").getACall() | ||
) | ||
// TODO: lxml.etree.parseid(<text>)[0] will contain the root element from parsing <text> | ||
// but we don't really have a way to model that nicely. | ||
this = [Element::instance(), ElementTree::instance()].getMember("xpath").getACall() | ||
} | ||
|
||
override DataFlow::Node getXPath() { result in [this.getArg(0), this.getArgByName("_path")] } | ||
|
@@ -209,31 +203,37 @@ module Lxml { | |
* A call to either of: | ||
* - `lxml.etree.fromstring` | ||
* - `lxml.etree.fromstringlist` | ||
* - `lxml.etree.HTML` | ||
* - `lxml.etree.XML` | ||
* - `lxml.etree.XMLID` | ||
* - `lxml.etree.XMLDTDID` | ||
* - `lxml.etree.parse` | ||
* - `lxml.etree.parseid` | ||
* | ||
* See | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.fromstring | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.fromstringlist | ||
* - https://lxml.de/apidoc/lxml.etree.html#lxml.etree.HTML | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.XML | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.XMLID | ||
* - https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLDTDID | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parse | ||
* - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parseid | ||
*/ | ||
private class LxmlParsing extends DataFlow::CallCfgNode, XML::XmlParsing::Range { | ||
string functionName; | ||
|
||
LxmlParsing() { | ||
functionName in ["fromstring", "fromstringlist", "XML", "XMLID", "parse", "parseid"] and | ||
functionName in [ | ||
"fromstring", "fromstringlist", "HTML", "XML", "XMLID", "XMLDTDID", "parse", "parseid" | ||
] and | ||
this = etreeRef().getMember(functionName).getACall() | ||
} | ||
|
||
override DataFlow::Node getAnInput() { | ||
result in [ | ||
this.getArg(0), | ||
// fromstring / XML / XMLID | ||
// fromstring / HTML / XML / XMLID / XMLDTDID | ||
this.getArgByName("text"), | ||
// fromstringlist | ||
this.getArgByName("strings"), | ||
|
@@ -248,7 +248,8 @@ module Lxml { | |
this.getParserArg() = XmlParser::instanceVulnerableTo(kind) | ||
or | ||
kind.isXxe() and | ||
not exists(this.getParserArg()) | ||
not exists(this.getParserArg()) and | ||
not functionName = "HTML" | ||
} | ||
|
||
override predicate mayExecuteInput() { none() } | ||
|
@@ -318,4 +319,183 @@ module Lxml { | |
|
||
override DataFlow::Node getAPathArgument() { result = this.getAnInput() } | ||
} | ||
|
||
/** Provides models for the `lxml.etree.Element` class. */ | ||
module Element { | ||
/** Gets a reference to the `Element` class. */ | ||
API::Node classRef() { result = etreeRef().getMember(["Element", "_Element"]) } | ||
|
||
/** | ||
* A source of `lxml.etree.Element` instances, extend this class to model new instances. | ||
* | ||
* This can include instantiations of the class, return values from function | ||
* calls, or a special parameter that will be set when functions are called by an external | ||
* library. | ||
* | ||
* Use the predicate `Element::instance()` to get references to instances of `lxml.etree.Element` instances. | ||
*/ | ||
abstract class InstanceSource instanceof API::Node { | ||
/** Gets a textual representation of this element. */ | ||
string toString() { result = super.toString() } | ||
} | ||
|
||
/** Gets a reference to an `lxml.etree.Element` instance. */ | ||
API::Node instance() { result instanceof InstanceSource } | ||
|
||
/** An `Element` instantiated directly. */ | ||
private class ElementInstance extends InstanceSource { | ||
ElementInstance() { this = classRef().getAnInstance() } | ||
} | ||
|
||
/** The result of a parse operation that returns an `Element`. */ | ||
private class ParseResult extends InstanceSource { | ||
ParseResult() { | ||
// TODO: The XmlParser module does not currently use API graphs | ||
this = | ||
[ | ||
etreeRef().getMember("XMLParser").getAnInstance(), | ||
etreeRef().getMember("get_default_parser").getReturn() | ||
].getMember("close").getReturn() | ||
or | ||
// TODO: `XMLID`, `XMLDTDID`, `parseid` returns a tuple of which the first element is an `Element`. | ||
// `iterparse` returns an iterator of tuples, each of which has a second element that is an `Element`. | ||
this = etreeRef().getMember(["XML", "HTML", "fromstring", "fromstringlist"]).getReturn() | ||
} | ||
} | ||
|
||
/** A call to a method on an `Element` that returns another `Element`. */ | ||
private class ElementMethod extends InstanceSource { | ||
ElementMethod() { | ||
// an Element is an iterator of Elements | ||
this = instance().getASubscript() | ||
or | ||
// methods that return an Element | ||
this = instance().getMember(["find", "getnext", "getprevious", "getparent"]).getReturn() | ||
or | ||
// methods that return an iterator of Elements | ||
this = | ||
instance() | ||
.getMember([ | ||
"cssselect", "findall", "getchildren", "getiterator", "iter", "iterancestors", | ||
"iterdecendants", "iterchildren", "itersiblings", "iterfind", "xpath" | ||
]) | ||
.getReturn() | ||
.getASubscript() | ||
} | ||
} | ||
|
||
/** A call to a method on an `ElementTree` that returns an `Element`. */ | ||
private class ElementTreeMethod extends InstanceSource { | ||
ElementTreeMethod() { | ||
this = ElementTree::instance().getMember(["getroot", "find"]).getReturn() | ||
or | ||
this = | ||
ElementTree::instance() | ||
.getMember(["findall", "getiterator", "iter", "iterfind", "xpath"]) | ||
.getReturn() | ||
.getASubscript() | ||
} | ||
} | ||
|
||
/** | ||
* An additional taint step from an `Element` instance. | ||
* See https://lxml.de/apidoc/lxml.etree.html#lxml.etree.ElementBase. | ||
*/ | ||
private class ElementTaintStep extends TaintTracking::AdditionalTaintStep { | ||
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { | ||
exists(DataFlow::MethodCallNode call | | ||
nodeTo = call and instance().asSource().flowsTo(nodeFrom) | ||
| | ||
call.calls(nodeFrom, | ||
// We consider a node to be tainted if there could be taint anywhere in the element tree; | ||
// so sibling nodes (e.g. `getnext`) are also tainted. | ||
// This ensures nodes like `elem[0].getnext()` are tracked. | ||
[ | ||
"cssselect", "find", "findall", "findtext", "get", "getchildren", "getiterator", | ||
"getnext", "getparent", "getprevious", "getroottree", "items", "iter", | ||
"iterancestors", "iterchildren", "iterdescendants", "itersiblings", "iterfind", | ||
"itertext", "keys", "values", "xpath" | ||
]) | ||
) | ||
or | ||
exists(DataFlow::AttrRead attr | nodeTo = attr and instance().asSource().flowsTo(nodeFrom) | | ||
attr.accesses(nodeFrom, ["attrib", "base", "nsmap", "prefix", "tag", "tail", "text"]) | ||
) | ||
} | ||
} | ||
} | ||
|
||
/** Provides models for the `lxml.etree.ElementTree` class. */ | ||
module ElementTree { | ||
/** Gets a reference to the `ElementTree` class. */ | ||
API::Node classRef() { result = etreeRef().getMember(["ElementTree", "_ElementTree"]) } | ||
|
||
/** | ||
* A source of `lxml.etree.ElementTree` instances; extend this class to model new instances. | ||
* | ||
* This can include instantiations of the class, return values from function | ||
* calls, or a special parameter that will be set when functions are called by an external | ||
* library. | ||
* | ||
* Use the predicate `ElementTree::instance()` to get references to instances of `lxml.etree.ElementTree` instances. | ||
*/ | ||
abstract class InstanceSource instanceof API::Node { | ||
/** Gets a textual representation of this element. */ | ||
string toString() { result = super.toString() } | ||
} | ||
|
||
/** Gets a reference to an `lxml.etree.ElementTree` instance. */ | ||
API::Node instance() { result instanceof InstanceSource } | ||
|
||
/** An `ElementTree` instantiated directly. */ | ||
private class ElementTreeInstance extends InstanceSource { | ||
ElementTreeInstance() { this = classRef().getAnInstance() } | ||
} | ||
|
||
/** The result of a parse operation that returns an `ElementTree`. */ | ||
private class ParseResult extends InstanceSource { | ||
ParseResult() { this = etreeRef().getMember("parse").getReturn() } | ||
} | ||
|
||
/** A call to a method on an `Element` that returns another `Element`. */ | ||
private class ElementMethod extends InstanceSource { | ||
ElementMethod() { this = Element::instance().getMember("getroottree").getReturn() } | ||
} | ||
|
||
/** An additional taint step from an `ElementTree` instance. See https://lxml.de/apidoc/lxml.etree.html#lxml.etree._ElementTree */ | ||
private class ElementTaintStep extends TaintTracking::AdditionalTaintStep { | ||
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { | ||
exists(DataFlow::MethodCallNode call | | ||
nodeTo = call and instance().asSource().flowsTo(nodeFrom) | ||
| | ||
call.calls(nodeFrom, | ||
[ | ||
"find", "findall", "findtext", "get", "getiterator", "getroot", "iter", "iterfind", | ||
"xpath" | ||
]) | ||
) | ||
or | ||
exists(DataFlow::AttrRead attr | nodeTo = attr and instance().asSource().flowsTo(nodeFrom) | | ||
attr.accesses(nodeFrom, "docinfo") | ||
) | ||
} | ||
} | ||
} | ||
|
||
/** A call to serialise xml to a string */ | ||
private class XmlEncoding extends Encoding::Range, DataFlow::CallCfgNode { | ||
XmlEncoding() { | ||
this = etreeRef().getMember(["tostring", "tostringlist", "tounicode"]).getACall() | ||
} | ||
|
||
override DataFlow::Node getAnInput() { | ||
result = [this.getArg(0), this.getArgByName("element_or_tree")] | ||
} | ||
|
||
override DataFlow::Node getOutput() { result = this } | ||
|
||
override string getFormat() { result = "XML" } | ||
} | ||
// TODO: ElementTree.write can write to a file-like object; should that be a flow step? | ||
// It also can accept a filepath which could be a path injection sink. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we can add that as a sink directly by extending the right concept? (Probably out of scope for this PR) |
||
} |
3 changes: 3 additions & 0 deletions
3
python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.expected
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
argumentToEnsureNotTaintedNotMarkedAsSpurious | ||
untaintedArgumentToEnsureTaintedNotMarkedAsMissing | ||
testFailures |
2 changes: 2 additions & 0 deletions
2
python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.ql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
import experimental.meta.InlineTaintTest | ||
import MakeInlineTaintTest<TestTaintTrackingConfig> |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice with a shared flow state tracking file-like objects. Javascript has constructions of that sort.