Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url matching order changed versus 2.2 #2924

Open
jfly opened this issue Jul 16, 2024 · 3 comments · May be fixed by #3018
Open

url matching order changed versus 2.2 #2924

jfly opened this issue Jul 16, 2024 · 3 comments · May be fixed by #3018

Comments

@jfly
Copy link

jfly commented Jul 16, 2024

(Sorry for the convoluted title: I don't know the right vocabulary for this.)

I noticed when upgrading to Werkzeug 2.2+, two of my routing rules have swapped priority.

Sample application

Here's a simple werkzeug application. The important part is the 2 rules:

Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
cat server.py
from werkzeug.wrappers import Request, Response
from werkzeug.routing import Map, Rule
from werkzeug.exceptions import HTTPException


class SimpleServer:
    def __init__(self, server_name: str, url_map: Map):
        self._server_name = server_name
        self._url_map = url_map

    def dispatch_request(self, request):
        adapter = self._url_map.bind_to_environ(request.environ, server_name=self._server_name)
        try:
            endpoint, values = adapter.match()
            return endpoint(request, **values)
        except HTTPException as e:
            return e

    def wsgi_app(self, environ, start_response):
        request = Request(environ)
        response = self.dispatch_request(request)
        return response(environ, start_response)

    def __call__(self, environ, start_response):
        return self.wsgi_app(environ, start_response)


class Demo(SimpleServer):
    def __init__(self, server_name: str):
        url_map = Map([
            Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
            Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
        ])
        super().__init__(server_name, url_map)

    def on_static(self, _request, filename):
        return Response(f'on_static: {filename=}')

    def on_healthcheck(self, _request, subdomain):
        return Response(f'on_healthcheck {subdomain=}')


if __name__ == '__main__':
    from werkzeug.serving import run_simple

    port = 8080
    app = Demo(server_name=f"example.com:{port}")
    run_simple('127.0.0.1', port, app, use_debugger=True, use_reloader=True)

Before (on werkzeug <2.2)

Run the server:

pip install werkzeug==2.1.2 && python server.py

Note how /healthcheck behaves the same on both the "static" subdomain, and a different subdomain "foo":

"static" subdomain:

$ curl http://static.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='static'

"foo" subdomain:

$ curl http://foo.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='foo'

After (werkzeug 2.2+)

pip install werkzeug==2.2.0 && python server.py

Note how /healthcheck now behaves differently on the two subdomains. On "static", we now get back the on_static endpoint, and on "foo" we still get back the on_healthcheck endpoint.

"static" subdomain:

$ curl http://static.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_static: filename='healthcheck'

"foo" subdomain:

$ curl http://foo.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='foo'

I see the same behavior with the latest version of werkzeug (3.0.3 at time of writing).

Summary

Is this change in behavior intentional? The PR #2433 just describes this as a faster matcher, it doesn't say anything about a change in behavior.

Is there some way of configuring a route across all subdomains that takes precedence over the subdomain specific /<path:filename> rule in my example?

Workaround

I don't have a great workaround for this. I can get close to the pre-werkzeug 2.2 behavior by adding a 3rd rule specifically for the "static" subdomain:

Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
+Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="static"),

But this behaves a bit differently: there's no subdomain argument passed to my endpoint handler.
Environment:

  • Python version: 3.12.4
  • Werkzeug version: multiple, see description above

@pgjones, since you wrote the new matcher

@davidism davidism changed the title Regression in Werkzeug 2.2 (State Machine Matcher): priority between complex rule with subdomain and simple rule for all subdomains changes url matching order changed since 2.2 Jul 16, 2024
@davidism davidism changed the title url matching order changed since 2.2 url matching order changed versus 2.2 Jul 16, 2024
@davidism
Copy link
Member

@pgjones can you check this when you get a chance?

@tgbugs
Copy link

tgbugs commented Jan 16, 2025

99% This is a bug in the new matcher.

target = "/".join(parts)

target = "/".join(parts) should be target = "/" + "/".join(parts) based on what the rules look like that should match.

Making the change fixes a similar issue on my end and all tests pass, which indicates that there is a missing test.

edit: this is not the issue, false positive related to the structure of the routes I was testing

@tgbugs
Copy link

tgbugs commented Jan 16, 2025

Having dug deeper into the issue, I think it comes from a conflict in how RulePart weights are calculated. Currently RulePart Weighting uses -len(argument_weights) I think probably as an attempt at a quick fix for greedy matching of paths. This was likely to satisfy test_greedy in test_routing.py.

However, test_greedy is fundamentally wrong in its current form because it tries to force matching of two path variables back to back /<path:bar>/<path:blub>. There is no way to specify where one starts and the other ends. If the desire on the part of the user is that there be a single element at the end that is not part of <path/bar> then the correct way to express that is as /<path:bar>/<blub>, this is also one of the only two reasonable interpretations of back to back paths, the other being /<bar>/<path:blub>.

The simplest fix is to switch to use len(argument_weights), inverting the ordering. However there are two other tests that fail when that change is made, so I am investigating what is going on.

Here is an example of the missing test.

def test_static_priority():
    map = r.Map(
        [
            r.Rule("/<path:dyn2>/<dyn1>", endpoint="file"),
            r.Rule("/<dyn1>/statn", endpoint="stat"),
        ],
    adapter = map.bind("example.org", "/")

    assert adapter.match("/d2/d1", method="GET") ==  ('file', {'dyn2': 'd2', 'dyn1': 'd1'})
    assert adapter.match("/d1/statn", method="GET") ==  ('stat', {'dyn1': 'd1'})

tgbugs added a commit to tgbugs/werkzeug that referenced this issue Jan 16, 2025
Fix issue pallets#2924 where Rules with dynamic elements would take priority
over rules with static elements of the same length.

The underlying issue was that the ordering for rules was backwards
with regard to number of argument weights. In order to ensure that
static elements match first the shortest rule must be tested first.

To ensure that a RulePart that contains a path converter does not
incorrectly take priority over other RuleParts that are part of other
Rules that should have priority, RuleParts containing a path converter
are assigned an infinite number of argument weights because they can
consume an arbitrary number of url path elements when matching.

With this change, two consecutive path converters give priority to the
first converter. In general two consecutive path converters with
different names cannot have consistent or predicatable behavior (where
would you cut?). Tests are updated accordingly. Might consider making
back to back path converters an error.

closes pallets#2924
tgbugs added a commit to tgbugs/werkzeug that referenced this issue Jan 16, 2025
Fix issue pallets#2924 where Rules with dynamic elements would take priority
over rules with static elements of the same length.

The underlying issue was that the ordering for rules was backwards
with regard to number of argument weights. In order to ensure that
static elements match first the shortest rule must be tested first.

To ensure that a RulePart that contains a path converter does not
incorrectly take priority over other RuleParts that are part of other
Rules that should have priority, RuleParts containing a path converter
are assigned an infinite number of argument weights because they can
consume an arbitrary number of url path elements when matching.

With this change, two consecutive path converters give priority to the
first converter. In general two consecutive path converters with
different names cannot have consistent or predicatable behavior (where
would you cut?). Tests are updated accordingly. Might consider making
back to back path converters an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants