-
-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema extension for estimating query complexity #3721
base: main
Are you sure you want to change the base?
Add schema extension for estimating query complexity #3721
Conversation
This commit introduces a new `QueryComplexityEstimator` class. It's a `SchemaExtension` which traverses the query tree during validation time and estimates the complexity of executing of each node of the tree. Its intended use is primarily token-bucket rate-limiting based on query-complexity.
Reviewer's Guide by SourceryThis PR introduces a new schema extension for estimating GraphQL query complexity. The extension traverses the query tree during validation and calculates complexity scores for each operation, which can be used for rate limiting or other purposes. The implementation includes a base estimator class and two concrete implementations for common use cases. Class diagram for QueryComplexityEstimator and related classesclassDiagram
class SchemaExtension
class FieldExtension
class FieldComplexityEstimator {
+estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
+resolve(next_: Callable[..., Any], source: Any, info: Info, **kwargs: Any) Any
}
FieldComplexityEstimator --|> FieldExtension
class SimpleFieldComplexityEstimator {
+scalar_complexity: int
+estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
}
SimpleFieldComplexityEstimator --|> FieldComplexityEstimator
class ConstantFieldComplexityEstimator {
+complexity: int
+estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
}
ConstantFieldComplexityEstimator --|> FieldComplexityEstimator
class QueryComplexityEstimator {
+default_estimator: FieldComplexityEstimator
+callback: Optional[Callable[[Dict[str, int], ExecutionContext], None]]
+response_key: Optional[str]
+get_results() Dict[str, Any]
+on_validate() Iterator[None]
}
QueryComplexityEstimator --|> SchemaExtension
class ExecutionContext
class Info
class FieldArgumentsType
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @serramatutu - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 4 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Hi, thanks for contributing to Strawberry 🍓! We noticed that this PR is missing a So as soon as this PR is merged, a release will be made 🚀. Here's an example of Release type: patch
Description of the changes, ideally with some examples, if adding a new feature. Release type can be one of patch, minor or major. We use semver, so make sure to pick the appropriate type. If in doubt feel free to ask :) Here's the tweet text:
|
81ca2fb
to
2e7a806
Compare
I'll wait for review on the main functionality before adding a proper |
CodSpeed Performance ReportMerging #3721 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3721 +/- ##
========================================
Coverage 97.00% 97.01%
========================================
Files 501 503 +2
Lines 33490 33714 +224
Branches 5592 5621 +29
========================================
+ Hits 32487 32706 +219
- Misses 791 795 +4
- Partials 212 213 +1 |
variables = self.execution_context.variables or {} | ||
node_body = node | ||
args = { | ||
to_snake_case(arg.name.value): variables.get(arg.value.name.value, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an inverse function to self.execution_context.schema.config.name_converter.apply_naming_config
that will turn camel into snake case without hardcoding it here? What we need is to get the python_name
of a field from its graphql_name
|
||
from strawberry.extensions.base_extension import SchemaExtension | ||
from strawberry.extensions.field_extension import FieldExtension | ||
from strawberry.extensions.query_depth_limiter import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: I reused some of the pre-existing machinery from QueryDepthLimiter
. I didn't move it out of there to some sort of shared utils file since that code seems to have a special license.
I'm noticing |
@jacobmoshipco Yea I think that's a good point. I can't really see any practical use cases right now that would require the field itself, but just having it there would make the API a lot more flexible in case users wanna do introspection etc I will add that :) |
Summary
This commit introduces a new
QueryComplexityEstimator
class. It's aSchemaExtension
which traverses the query tree during validation time and estimates the complexity of executing of each node of the tree. Its intended use is primarily token-bucket rate-limiting based on query complexity, but the API is flexible enough to allow users to do whatever they want with the results.Types of Changes
Issues Fixed or Closed by This PR
Public API
I introduced the following types users can interact with:
QueryComplexityEstimator
This is the
SchemaExtension
that should get added to the global StrawberrySchema
. It has 3 possible arguments:default_estimator: FieldComplexityEstimator | int
: The defaultFieldComplexityEstimator
that should be used for fields which don't specify an override. This provides a nice default so users don't need to bother adding complexity estimation logic for every single field. Just throwing in a good default should solve 90% of your problems. If this is anint
, we'll automatically use aConstantFieldComplexityEstimator
.callback: Callable[[Dict[str, int], ExecutionContext], None] | None
: In this callback, users can implement their own logic that uses the complexity value, such as raising exceptions for rate-limiting, logging this complexity to Datadog or whatever.response_key: str | None
: If it's provided, the extension will return the complexity map to the client underresult.extensions[response_key]
. If it's not provided, complexities will not be returned to clients.FieldComplexityEstimator
This is the base class that estimates the complexity of running a single field. Users can inherit from this class to implement their own complexity calculation logic. It has a
estimate_complexity()
method which accepts the following args:child_complexities: Iterator[int]
: an iterator over the complexities of child nodes (i.e nodes in the selection set). If this is empty, it means the node we're currently evaluating is a leaf. It's worth noting that this iterator is lazy, meaning that if an estimator doesn't use the child complexities, they won't get calculated. This should help a bit with performance and prune some branches of the tree if the complexity can be resolved at the parent.arguments: FieldArgumentsType
: This is a dictionary of all the arguments (parameters) for a field. This can be used to calculate the complexity based on a user-defined parameter likepage_size
,document_length
or whatever.The following 2 types are concrete implementations of
FieldComplexityEstimator
that users can use out of the box without bothering to implement them. I added these since I expect these patterns to be quite common.ConstantFieldComplexityEstimator
An implementation of
FieldComplexityEstimator
that takes in a singlecomplexity: int
parameter and always returns that, no matter the cost of its children or parameter values.SimpleFieldComplexityEstimator
An implementation of
FieldComplexityEstimator
that takes in a singlescalar_complexity: int
and returns that for scalar nodes. Object nodes return the sum of their children.Admittedly, this name kinda sucks. I couldn't think of anything better though, so I'm happy to change if anyone has a better idea.
Example
Here's an example that combines all of that to make it easier to digest:
Caveats, possible issues and other solutions I considered
There are some caveats to this implementation, and I wanted to bring them to your attention. I'm happy to discuss and/or change the implementation if any of these are dealbreakers :)
Traversing the tree twice
QueryComplexityEstimator
runs onon_validate
and it traverses the GraphQL document AST using its own logic. This means that if it gets used with another extension that does the same such asQueryDepthLimiter
, it is possible that a significant chunk of the tree will get traversed twice since each extension traverses the tree on its own. This could introduce performance issues, especially for large queries.The idea I had to optimize this was implement a generic
Visitor
class that gets called by the Strawberry core only once, and then all the "visiting" extensions would register themselves to this visitor. While it's probably faster since it traverses the tree only once, this other approach isn't as flexible since extensions wouldn't get to pick visiting order, and the state management in each extension probably gets a little more annoying. It would also require some significant changes to the Strawberry core, which I thought was out of scope for this PR.The estimator is not eager
Differently from
MaxTokensLimiter
andQueryDepthLimiter
, theQueryComplexityEstimator
is not eager. In other words, there is no predefined "limit" that will stop tree traversal and simply return "you're over the limit". Instead, it always calculates the complexity of the entire query, and it is on the developer to use thecallback
and do whatever they want with the results.The main reasoning behind this is that by the nature of rate-limiting, the limit fluctuates, and needs to be dynamically assessed, so we can't create a global
limit
threshold. I thought of solving this by making apre
hook that would return the current limit given theExecutionContext
. IMO this would make the API more convoluted and annoying than it needs to be, so in the end I discarded that idea and just went withcallback
.This decision has one important consideration: users should be strongly encouraged to use
MaxTokensLimiter
andQueryDepthLimiter
to limit the size of the GraphQL documents before passing them intoQueryComplexityEstimator
, otherwise huge queries could cause unbounded recursion and become an attack vector.Would love to hear your thoughts on this, though!j
Does not calculate "actual" cost
In the original issue, some people suggested this API should be able to calculate the "actual" cost of a query after resolving it. I decided not to do that since it would add another layer of complexity and overhead, and calculating an "actual" cost can be tricky and result in unpredictable results for clients.
For example, if your database had a failover that made a specific request take longer because the connection was cold, a naive implementation of "actual" query cost might blame the user's query for that and assume it took longer because the query was complex, resulting in them being rate-limited earlier due to a system fault. This would make rate-limiting non-deterministic and could add unpredictability to your APIs.
I'm not saying it isn't possible to calculate "actual" query cost deterministically, but it's a much harder effort than estimating query costs based solely on requested fields and on their input parameters. So I thought the juice was not worth the squeeze.
This is why I named the classes "estimator" and not "calculator".
Checklist
Thank you for creating and maintaing Strawberry :D
Summary by Sourcery
Add a new QueryComplexityEstimator class to estimate query complexity for rate-limiting and other purposes, along with supporting classes and documentation.
New Features:
Documentation:
Tests: