You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed that the parser sometimes ignores the state of tokenizers that silently discard some tokens. In particular, the state is ignored if the first input chunk(s) only consist of discarded tokens. This results in the position information of the tokens becoming desynchronized from the input. Below is an example of a tokenizer that exhibits this behaviour.
constdiscard={"whitespace": true,"comment": true};functionnext(){lettoken;do{token=/* next token from the buffer */;}while(token&&discard[token.type]);returntoken;}
The cause appears to be the below if-statement in combination with the defined behavior of lexer.reset(chunk, info).
This statement seems to assume that if there has been no tokens so far, there is no tokenizer state. Simply always executing this.lexerState = lexer.save() resolves the issue. There may be circumstances where the current behaviour is required (which I am unaware of), so it may be prudent to define a parser option that causes the tokenizer state to always be stored.
The text was updated successfully, but these errors were encountered:
I have observed that the parser sometimes ignores the state of tokenizers that silently discard some tokens. In particular, the state is ignored if the first input chunk(s) only consist of discarded tokens. This results in the position information of the tokens becoming desynchronized from the input. Below is an example of a tokenizer that exhibits this behaviour.
The cause appears to be the below if-statement in combination with the defined behavior of
lexer.reset(chunk, info)
.nearley/lib/nearley.js
Lines 356 to 358 in 6e24450
This statement seems to assume that if there has been no tokens so far, there is no tokenizer state. Simply always executing
this.lexerState = lexer.save()
resolves the issue. There may be circumstances where the current behaviour is required (which I am unaware of), so it may be prudent to define a parser option that causes the tokenizer state to always be stored.The text was updated successfully, but these errors were encountered: