Test coverage for generated lexer #269

klondikedragon · 2022-09-24T18:28:51Z

Before adding new test cases that generate additional gaps in the generated lexer, I wanted to have the current reflection lexer test suite cover the generated lexer (to the extent possible, backreferences and match longest are not (yet) supported by generated lexer). This is an attempt at that, for discussion. It puts the rules for the test suite into the lexer/internal package (testutils), and then an internal utility is added to generate the JSON files, and then a Makefile is added to use these JSON files to regenerate the test lexers used by the test suite.

Created Makefile that regenerates JSON for test lexers and lexer code
Refactored lexer stateful_test to test reflection and generated lexers
lexer benchmarks also test both where possible
TODOs added to lexer stateful_test noting unsupported generated lexer features

The current tests expose that the current generated lexer doesn't support eliding matched tokens if the rule starts with a lowercase letter (e.g., whitespace rule vs Whitespace rule), so a few of the test cases for the generated sub-case are failing. For this reason, if you don't want test cases on the master branch failing, you may want to merge this onto a different branch for now.

The PR appears large, but most of it is generated code needed for the test suite to run (which now depends on generated versions of more lexer definitions used in the lexer test suite).

My intent is to add onto this test suite with additional test cases for the lexer that highlight scenarios I've encountered where the reflection lexer is accurate but the generated lexer is not.

* Created Makefile that regenerates JSON for test lexers and lexer code * Refactored lexer stateful_test to test reflection and generated lexers * lexer benchmarks also test both where possible * TODOs added to lexer stateful_test noting unsupported generated lexer features The current tests expose that the current generated lexer doesn't support eliding matched tokens if the rule starts with a lowercase letter (e.g., `whitespace` rule vs `Whitespace` rule)

klondikedragon · 2022-09-28T02:54:18Z

Closing in favor of #270. Once #270 stabilizes will open PRs to enhance the lexer conformance tests with additional cases highlighting gaps in current generated lexer, etc.

klondikedragon added 2 commits September 24, 2022 12:20

fix unused utf8 import for simple generated lexers

ccb68f2

klondikedragon mentioned this pull request Sep 24, 2022

Streamline long term maintenance of lexer/parser code generators #264

Open

6 tasks

klondikedragon closed this Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test coverage for generated lexer #269

Test coverage for generated lexer #269

klondikedragon commented Sep 24, 2022

klondikedragon commented Sep 28, 2022

Test coverage for generated lexer #269

Test coverage for generated lexer #269

Conversation

klondikedragon commented Sep 24, 2022

klondikedragon commented Sep 28, 2022