Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape character not preserved in generated Lua #26

Open
gnois opened this issue Aug 20, 2015 · 2 comments
Open

Escape character not preserved in generated Lua #26

gnois opened this issue Aug 20, 2015 · 2 comments

Comments

@gnois
Copy link
Contributor

gnois commented Aug 20, 2015

I reasoned it's better to file the issues here rather than in my fork, because the issue links got confused in the commit history, pointing to issue of the same number in this repo.

print("\n\r")

generates this Lua code output.

print("\
\13")

The escapes should be preserved.

@gnois
Copy link
Contributor Author

gnois commented Aug 24, 2015

Hi Franko,

Just to let you know I have to redo the fix in 9342e5e in lexer instead of code generator, because I realized the lexer had already transformed \ddd escapes by the time it is done.
With the included test case, luajit-lang-toolkit gives

print("alo\n123")
print("\\\n\r\"''\0")
print("\3\vD\t\"'")
print("■\v\\\\\"\\'\f")

while this fix gives a more verbatim code with source

print("\97lo\10\04923")
print("\\\n\r\"'\'\0")
print("\3\v\x44\t\"\'")
print("\254\v\\\\\"\\\'\f")

Anyway I am not sure if this is relevant for luajit-lang-toolkit, because the running output will be the same anyway and error message deviates from luajit.
For eg:

print('\3\v\x4h4\t\"\'')

luajit and luajit-lang-toolkit gives

invalid escape sequence near ''♥♂'

while my fix gives

invalid escape sequence near '\3\v\x4

@franko
Copy link
Owner

franko commented Aug 24, 2015

Hi,

I've given a look to the commit, the idea is interesting. Basically in the lexer you just save the string in its original form, without replacing the escape sequences. In this way the code generator is trivial because you just output the string as it was in its original form, no need to interpret the escape sequences.

In reality your commit does two things because in addition you added a modification to raise un error when the escape sequence is not valid.

I could adopt the approach of storing the string in the original form but this would require a modification the bytecode generator to interpret the escape sequences.

I think also, like you, that for the language toolkit it is better to interpret the escape sequences in the lexer phase. It is not "natural" to replace escape sequences only in the bytecode generator, at the end of the pipeline. Probably keeping the string in its original form is ok for languages targeting only the Lua code generation.

I think I'm going to cherry-pick in your commit only the part when you raise un error for invalid escape sequences.

In any case thank you for sharing that, your focus on quality and clear error message is really a good thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants