-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Error tolerant parsing #102
Comments
As per further conversation in Slack, it is probably a better idea not to preserve rejected nodes as multiple rejections could pollute the AST. Unclear until implementation. |
A more complex example: AST type Decls struct {
Funcs []*Func `@@*`
}
type Func struct {
Name string `"func" @Ident "(" ")" "{"`
Body []*Stmt `@@* "}"`
}
type Stmt struct {
Asgn *Asgn ` @@`
If *If # `| @@`
}
type Asgn struct {
Name string `@Ident "="`
Value int `@Int`
}
type If struct {
Condition string `@Ident "{"`
Body []*Stmt `@@* "}"`
} Source
Parsed
In this case there are no additional tokens. On the contrary, there are missing tokens. Need a way to convey this. Maybe an |
The proposed error mechanism here sounds like what's called "synchronisation", where a rule is marked as a synchronisation point. Whenever a token that doesn't fit is encountered, the parser unwinds until it hits a synchronisation point, at which point it starts discarding tokens until it finds one that fits in the current context, and resumes parsing from there. IMO ideally the synchronisation points would be user specifiable with the ability to implement an interface for even more control over how the parser synchronises back to a sane state. |
Participle currently supports returning a partial AST when parsing fails, but only up to the point of failure. This issue proposes that Participle reject failed tokens but continue to parse, if possible.
(transcribed from Slack)
For participle to produce a full AST even when errored, my initial thought is that it would need to combinatorially match tokens to all branches of the parse tree, back to the root
eg. (hypothesising here) given this kind of input
and this grammar
Once it hits the tokens
[bar, c, =, 10]
,bar
would correctly parse intoAssignment.Ident
butValue
would fail, so the parser would perhaps reset back to the start of Assignment, but reject the tokenbar
as an error.There'd have to be some mechanism for capturing the failed branches. Probably out of band somehow, or possibly the partially valid sub-tree could be inserted with a marker indicating it's an error. Similar to how Pos lexer.Position is special cased there could be something like
ParseFailure lexer.Position
that is populated if that branch has errors.It would have to be opt-in though, as that kind of backtracking could be quite expensive.
I don't like that ParseFailure. What might be better is
RejectedTokens []lexer.Token
. It might not be ideal to require that be part of the AST 🤔.So the AST in this case might end up something like:
The RejectedTokens could be populated all the way back to the root, potentially.
The text was updated successfully, but these errors were encountered: