You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason terminals are put in namespaces is twofold (as far as I can remember):
To avoid collision of terminals with same name but different pattern. This issue can be solved without a namespace, by naming the terminals based on their pattern (or the hash of the pattern).
To know how to dispatch the token to the correct transformer. For example, a__X might need different handling than b__X.
Issue 2 is a bit harder to solve. We could decide that tokens aren't namespaced anymore, and let the users deal with the fallout.
Or we can assign a namespace for each rule, and rename the tokens when constructing the tree. That would solve this issue at a small performance cost. However, that still leaves lexer_callbacks, for which we won't be able to provide a namespace anymore. Perhaps it's an acceptable loss? (note it will mean that creating lexer_callbacks from the given transformer won't be viable anymore)
A very different view on the matter, is that these collisions would ideally be solved by the contextual lexer, which would be possible if they were solved by the parser. In other words, LALR(k) has a lower chance of producing such a colliison. The higher k is, the lower the chance. But, it's ofc very hard to implement or we would have already done so. And anyway, it doesn't guarantee that there won't be any collisions, just lowers the chances.
If we can't find a better approach, I would support adding a new flag such as "join_terminals" or "delayed_namespace", that will enable this behavior of joining terminals by value, and adding the namespace during tree construction. Perhaps the same can also be done for identical rules.
The text was updated successfully, but these errors were encountered:
The problem occurs because the terminals are defined in different namespaces, and therefor different names and instances, but may match the same text.
An example is described here: https://stackoverflow.com/questions/78692900/unclear-reason-for-terminal-collision
The reason terminals are put in namespaces is twofold (as far as I can remember):
a__X
might need different handling thanb__X
.Issue 2 is a bit harder to solve. We could decide that tokens aren't namespaced anymore, and let the users deal with the fallout.
Or we can assign a namespace for each rule, and rename the tokens when constructing the tree. That would solve this issue at a small performance cost. However, that still leaves
lexer_callbacks
, for which we won't be able to provide a namespace anymore. Perhaps it's an acceptable loss? (note it will mean that creating lexer_callbacks from the given transformer won't be viable anymore)A very different view on the matter, is that these collisions would ideally be solved by the contextual lexer, which would be possible if they were solved by the parser. In other words, LALR(k) has a lower chance of producing such a colliison. The higher k is, the lower the chance. But, it's ofc very hard to implement or we would have already done so. And anyway, it doesn't guarantee that there won't be any collisions, just lowers the chances.
If we can't find a better approach, I would support adding a new flag such as "join_terminals" or "delayed_namespace", that will enable this behavior of joining terminals by value, and adding the namespace during tree construction. Perhaps the same can also be done for identical rules.
The text was updated successfully, but these errors were encountered: