Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing regex pattern tokens breaks semantics #85

Open
olfek opened this issue Dec 1, 2024 · 17 comments
Open

Replacing regex pattern tokens breaks semantics #85

olfek opened this issue Dec 1, 2024 · 17 comments

Comments

@olfek
Copy link
Contributor

olfek commented Dec 1, 2024

.map((path) => path.replace(/[.+]/g, "\\$&")) // escape regex characters

replace here means I cannot use regex meta sequence . - for example .{3,}

.replace(/\?/g, ".") // user can type "?" to match any one character

replace here means I cannot use regex quantifier ? - for example y?

.replace(/\*/g, ".*") // user can type "*" to match any zero or more characters

replace here means I cannot use * with any token other than . - for example y*

@olfek
Copy link
Contributor Author

olfek commented Dec 1, 2024

To correct this:

  • Remove usages of replace and apologize for breaking changes.

OR

  • Add an option to enable TRUE regex mode (which skips usages of replace) which defaults to false and therefore does not break existing rulesets.

@olfek
Copy link
Contributor Author

olfek commented Dec 1, 2024

@penge What do you think?

@penge
Copy link
Owner

penge commented Dec 2, 2024

As it wasn't designed for full regular expressions, this change isn't a breaking change and doesn't require an apology.

Supporting ? and * (as described in README) offers a balance between rule flexibility and user-friendliness.


TRUE regex mode

Yes, this would be a great addition. How do you think we could implement it? Perhaps as lines formatted like regular expression literals?

/.../i

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

As it wasn't designed for full regular expressions, this change isn't a breaking change and doesn't require an apology.

I meant IF you removed the replace calls, it would be a breaking change.

Regex literals as input is a great idea, but remember to add a regex mode toggle which defaults to false. 🙂

@penge
Copy link
Owner

penge commented Dec 2, 2024

meant IF you removed

Got it. That'd be the case, then.

remember to add a regex mode toggle

How about we keep ? and * as are for most users, and treat any new lines starting with / (or basically look like as regular expression literals) as regular expressions literals?
No toggle would be needed.

Example:

/.*/watch\?v=.*/i     # and we could even add comments feature at the end of the lines
                      # that I could use to describe regular expressions
                      # or even group rules into sections

Would turn into:

const regex = new RegExp(String.raw`.*/watch\?v=.*`, "i");

And that would:

regex.test("https://www.youtube.com/watch?v=123");   // true
regex.test("https://www.youtube.com/WATCH?v=123");   // true

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge 👌

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge

export default (url: string, blocked: string[]): Rule | undefined => {
const normalizedUrl = removeProtocol(url);
const rules = makeRules(blocked);
const foundRule = rules.find(({ path }) => {
const patterns = expandPath(path)
.map((path) => path.replace(/[.+]/g, "\\$&")) // escape regex characters
.map((path) => (
"^"
+ path
.replace(/\?/g, ".") // user can type "?" to match any one character
.replace(/\*/g, ".*") // user can type "*" to match any zero or more characters
+ "$"
));
const found = patterns.some((pattern) => {
const matches = normalizedUrl.match(new RegExp(pattern));
return matches;
});
return found;
});
return foundRule;
};

export default (blocked: string[]): Rule[] => {
const allowList = blocked
.filter((item) => item.startsWith("!"))
.map((item) => removeProtocol(item.substring(1)));
const blockList = blocked
.filter((item) => !item.startsWith("!"))
.map(removeProtocol);
return [
...allowList.map((path) => ({ type: "allow", path } as Rule)),
...blockList.map((path) => ({ type: "block", path } as Rule)),
];
};

  • The input regex should not be modified in anyway (no replace or removeProtocol).
  • Regex should be matched against the original url (including protocol) and not normalizedUrl.

If a regex to block starts with /, how do we indicate a regex to allow?

https://developer.mozilla.org/docs/Web/JavaScript/Guide/Regular_expressions#advanced_searching_with_flags

Maybe a custom flag such as + to indicate an allow rule?

@penge
Copy link
Owner

penge commented Dec 2, 2024

  • The input regex should not be modified in anyway (no replace or removeProtocol).
  • Regex should be matched against the original url (including protocol) and not normalizedUrl.

I agree. Basically, two approaches there would be:

A) user-friendly, some modification, ? and * support
B) regex, no modification

how do we indicate a regex to allow?

We could use the semantics we already use, that would be putting ! in front:

!/.../        # exception, will be allowed

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

This feature will certainly make this application very powerful.

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

I'm thinking if a rule tester should go out with this feature, regex is sometimes quite tricky to get right.

@penge
Copy link
Owner

penge commented Dec 2, 2024

rule tester ... quite tricky to get right.

True. Let's have a tester then, to ensure it works as expected before visiting the site(s). Could be an input that accepts one or more URL(s) and it lists the rules that match them (if any; one URL can be matched by more than one rule; excluded from blocking should be also possible to test this way).

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge

I was thinking something simple like this

image

RED = will block
GREEN = will not block

@penge
Copy link
Owner

penge commented Dec 2, 2024

I considered this. It can be confusing a bit because "RED" also means "doesn't work", and "GREEN" means "works."

What about excluded rules? The color meanings there would be opposite, "GREEN" as "allows", or "RED" as "does not apply."


Maybe the colors meaning could be more broader:
RED = No match (either doesn't block or doesn't exclude)
GREEN = Match (either blocks or excludes)

Note: It may be difficult to work around the new inputs and align them. The large input area may need to be split into separate lines.

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge

Maybe the colors meaning could be more broader:
RED = No match (either doesn't block or doesn't exclude)
GREEN = Match (either blocks or excludes)

This is confusing for me because:
GREEN = Match, blocks which is a negative action, which we associate with the color RED.
RED = No match, does not block is a positive action, which we associate with the color GREEN.

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge

Whatever color you decide to go with, I think you should have a key that states what each color means 🙂

@olfek
Copy link
Contributor Author

olfek commented Dec 2, 2024

@penge
Copy link
Owner

penge commented Dec 2, 2024

or, wherever icons or colors come short, just use TEXT 😄

"blocks"
"does not block"
"excludes"
"does not exclude"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants