parser rewrite #47

psteinroe · 2023-11-14T17:11:03Z

What kind of change does this PR introduce?

complete rewrite of the parser for increased

resilience
performance
maintainabiltiy
extensibility

Highlights

lexer is a separate module and includes all tokens, including whitespaces, by merging the output of libpg_querys scan() method with a very simple custom lexer that extracts whitespaces.
rewrite statement parsing to be resilient. instead of regular expressions, we now use a simple LL-Parser. The idea is to just check if a new statement is starting by comparing the first few tokens. Once started, we walk all tokens until either a new statement is started, eof or ";" is reached. tokens within sub-statements (enclosed by (...)) are not tested against these conditions.
while valid statements are parsed with libpg_query, we can easily implement a custom resilient parser statement by statement for invalid ones.
invalid statements are parsed "flat", meaning that we just open the node, apply all tokens and close the node.
the parser for valid statements is now very performant. we turn the ast into an untyped tree structure where each node holds its list of properties. the parser then walks the tokens once, efficiently finds the next valid node, and opens / closes nodes accordingly.
the parser vor valid statements is also "stable", meaning that it will only produce a valid cst or panic. no manual comparison required anymore.

psteinroe · 2023-11-21T13:45:11Z

crates/parser/src/parse/source.rs

+    while !parser.eof() {
+        match is_at_stmt_start(parser) {
+            Some(stmt) => {
+                statement(parser, stmt);


custom parsers can be added here later, and statement would just be the fallback.

psteinroe · 2023-11-21T13:54:50Z

crates/codegen/src/get_node_properties.rs

+        .collect()
+}
+
+fn custom_handlers(node: &Node) -> TokenStream {


this is the only manual node-by-node implementation required for the parser.

- make whitespace lexer consume consecutive tokens as one - merge String, Integer etc Nodes into their parent - due to the aforementioned change, we wont have never-visited leaf nodes anymore and can skip the child check when closing leaf nodes - instead of searching for a token in the entire token range, we now only search for it in the next n+m non-whitespace tokens where n is the number of properties and m the number of tokens with just a single character (e.g. "(")

psteinroe added 14 commits October 25, 2023 20:17

feat: new lexer

872d9d4

feat: flat resilient parser

3191811

feat: add more statement defs

972aaf9

feat: a better statement parser

7c934fe

chore: more statements

df3f244

chore: more statements

8866a27

feat: add token type to token

e77c41d

feat: rewrite codegen, remove old code and prepare libpg_query parsing

d3a33fa

feat: first rough implementation of new valid stmt parser

c2e49aa

fix: make implementation even rougher but fix errors

068da65

refactor: split parser logic (in-progress)

6366e2f

refactor: finish refactoring libpg_query_node parser

b053162

chore: cleanup

cbba7c0

fix: minor fixes

2647627

psteinroe marked this pull request as ready for review November 21, 2023 13:33

psteinroe commented Nov 21, 2023

View reviewed changes

psteinroe added 2 commits November 21, 2023 14:48

chore: drop unused deps

d89d768

chore: merge dev

44c416f

psteinroe commented Nov 21, 2023

View reviewed changes

psteinroe added 9 commits November 23, 2023 12:00

minor fixes

724c1ca

minor fixes

e64a017

feat: bring back get_location

fabe344

feat: bring back get_location to increase accuracy of parser

cdad3ee

feat: more node implementations

e3ee3fb

feat: more node implementations and cleanup

84fca66

fix: switch ci to nightly

19a8ad1

chore: cleanup

f464bf5

psteinroe merged commit c8f1708 into main Dec 5, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser rewrite #47

parser rewrite #47

psteinroe commented Nov 14, 2023 •

edited

Loading

psteinroe Nov 21, 2023

psteinroe Nov 21, 2023

parser rewrite #47

parser rewrite #47

Conversation

psteinroe commented Nov 14, 2023 • edited Loading

What kind of change does this PR introduce?

Highlights

psteinroe Nov 21, 2023

Choose a reason for hiding this comment

psteinroe Nov 21, 2023

Choose a reason for hiding this comment

psteinroe commented Nov 14, 2023 •

edited

Loading