Feature/add inline flags #34

Yehuda-blip · 2023-04-07T08:29:05Z

These 3 flags are incompatible in regex: [ASCII, UNICODE, LOCALE]. This roughly means that the inline flags [a,u,L] cannot be used together in a same-nested-level flagging. When a when one of these flags is nested deeper than another one, it will override the outer flag for the substring it affects.
There are two problems with this implementation:
Python regex allows this behavior - (?i:) - while this PR does not allow [ignore_case] or [ignore_case ''] (the last one is easy to fix with half a line in compiler.py line:255). However, I don't hate this and think it's probably better behavior.
The bigger problem is the fact that (?u:(?a:somestring)) is allowed in regex, with the ASCII overriding the UNICODE flag (same with all examples of the incompatible flags [ASCII, UNICODE, LOCALE] together).
Here, this is not a valid expression - [unicode [ascii_only 'somestring']] - because nesting depth of an expression is not kept in the parsing flow. Note however, this is kind of generalization of the first problem - the concat operator makes it possible to write this [unicode [ascii_only 'somestring']'anything'], so the expression is invalid only if the outer flag is not operating on anything, but in this case - [unicode [ascii_only 'somestring']] - it seems less obvious to realize what's the problem when translating from regex to kleenexp.
The 3 solutions I can think of for this are:
a. Add a nesting depth value to nodes when parsing - I don't want to do, because I don't understand enough of the parsing and this seems a major change.
b. Add this to a collection of 'problematic behaviors' somewhere.
c. Remove the unicode and locale_dependent flags completely from kleenexp - kind of an overreaction in my opinion.
d. Ignore, which is what I'm doing now and is probably an inferior solution to b.

…implemented, and no testing was done

…till untested.

…inst bytes-like patterns

Yehuda-blip · 2023-04-07T08:31:14Z

ke/tests/test_asm.py

+        == r"(?L:test)"
+    )
+
+    InlineFlag.PATTERN_IS_BYTES_LIKE = False


pre-commit did not like my formatting of these tests, and made these really line-heavy trees. Originally each call was in it's own line.

Yehuda-blip · 2023-04-07T08:32:21Z

ke/__init__.py

@@ -54,7 +56,9 @@
 def re(pattern: AnyStr, flavor: Optional[Flavor] = None) -> AnyStr:
    # TODO: LRU cache
    if _is_bytes_like(pattern):
+        asm.InlineFlag.PATTERN_IS_BYTES_LIKE = True


This might be better placed in some global values-per-compilation dictionary somewhere.

…implemented, and no testing was done

…till untested.

…inst bytes-like patterns

…eature/add_inline_flags # Conflicts: # ke/asm.py

Yehuda-blip · 2023-04-07T12:30:48Z

ke/asm.py

+        # reaches the to_regex() method is the first one in the
+        # sequence (there may be other sequences). This is necessary
+        # because the parenthesis wrapping and regex legality are
+        # dependent on the whole flagging expression.


If we drop all the validations, there is always the option to simply compile [ignore_case multiline 'string'] to
(?i:(?m:string)) instead of (?im:string), which should make the whole thing a lot simpler (and I very much doubt has much effect on the performance of the result, if this is even a concern).

YehudaKLEIN added 10 commits April 1, 2023 19:20

Initial: add flag types and regexing methods. Global flags still not …

797f157

…implemented, and no testing was done

Unite all flags to single class, where unset is a parameter (name). S…

507971f

…till untested.

add asm tests (and fix a mistake they showed)

83e573e

add more asm tests

30f0342

add validation on input to flag

58594eb

add validation tests

6c2e804

add validation to insure validity of locale_dependent and unicode aga…

d78992c

…inst bytes-like patterns

Add api tests. Fix asm tests. Fix some logic that was broken.

ce46ec1

Reformat comment

43c3443

add test

fb2ff5b

Yehuda-blip commented Apr 7, 2023

View reviewed changes

YehudaKLEIN added 11 commits April 7, 2023 13:13

Initial: add flag types and regexing methods. Global flags still not …

9236c84

…implemented, and no testing was done

Unite all flags to single class, where unset is a parameter (name). S…

92d9c6e

…till untested.

add asm tests (and fix a mistake they showed)

fa5858c

add more asm tests

12f24ab

add validation on input to flag

e44b940

add validation tests

a690981

add validation to insure validity of locale_dependent and unicode aga…

b326621

…inst bytes-like patterns

Add api tests. Fix asm tests. Fix some logic that was broken.

0534cd3

Reformat comment

48a529f

add test

38dee25

Merge remote-tracking branch 'origin/feature/add_inline_flags' into f…

09af9a5

…eature/add_inline_flags # Conflicts: # ke/asm.py

Yehuda-blip commented Apr 7, 2023

View reviewed changes

Yehuda-blip marked this pull request as draft May 19, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add inline flags #34

Feature/add inline flags #34

Yehuda-blip commented Apr 7, 2023 •

edited

Loading

Yehuda-blip Apr 7, 2023

Yehuda-blip Apr 7, 2023

Yehuda-blip Apr 7, 2023 •

edited

Loading

Feature/add inline flags #34

Are you sure you want to change the base?

Feature/add inline flags #34

Conversation

Yehuda-blip commented Apr 7, 2023 • edited Loading

Yehuda-blip Apr 7, 2023

Choose a reason for hiding this comment

Yehuda-blip Apr 7, 2023

Choose a reason for hiding this comment

Yehuda-blip Apr 7, 2023 • edited Loading

Choose a reason for hiding this comment

Yehuda-blip commented Apr 7, 2023 •

edited

Loading

Yehuda-blip Apr 7, 2023 •

edited

Loading