Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pain points with the examples and API, plus recommendations #1

Open
1 of 10 tasks
jgonggrijp opened this issue Aug 6, 2021 · 5 comments
Open
1 of 10 tasks

Pain points with the examples and API, plus recommendations #1

jgonggrijp opened this issue Aug 6, 2021 · 5 comments

Comments

@jgonggrijp
Copy link

jgonggrijp commented Aug 6, 2021

First, some positive notes.

  • The parser generator is fast, which is pleasant.
  • It generates a standalone module that can be used as a library.

General pain points and recommendations:

  • Having to fix the working directory or to set the PYTHONPATH environment variable in order to be able to execute lark-js is inconvenient.
  • Since it requires Python, the parser generator is essentially targeting Python users (who need JS parsers, so possible JS users at the same time). These users might also use Lark proper, so it may feel cumbersome for them that they have to install a second package just to generate JS parsers for their grammars.
  • I recommend including the JS generator in Lark proper, possibly as an option or subcommand, at least as long as Lark.js requires Python.
  • In the JS example code, you are importing the namespace of the generated parser module as lark. This may give the impression to users that they get a full JS equivalent of the original Lark library. I could not tell immediately from the code whether this is actually the case, but if it isn't, I recommend going with a different name that is more specific to the parsed grammar. For example, json_parser = require('./json_parser.js');.
  • load_parser is a slightly confusing name for the main interface entry point, because the word "load" could also mean that you're supposed to pass the parser itself as an argument. I suggest get_parser or just parser instead.
  • The generated JS module mentions some available options in the top comments, but those comments by themselves are not enough to understand what those options will do or what kinds of values can be passed to them. Without intensively studying the code, I could only infer that the transformer was optional. Leaving it out was instructive, but I would not have been able to predict the difference without trying. I recommend expanding the top comment to the point that it alone is sufficient to make basic use of the mentioned options. Options that would require too much explanation can be left out and deferred to the standalone documentation.
  • I understand that the line To test: postlex, lexer_callbacks, g_regex_flags is probably just a "note to self", but I must point out that it is extremely cryptic to an outsider reading the code. If it is a note to self, I suggest moving it to an issue ticket. If it is meant as a suggestion for the user, you need to expand considerably on how to use these parameters.
  • lark-js currently generates a CommonJS module, but the generated code is portable enough to also work in other environments than Node.js. In addition to that, a growing number of Node.js users is expecting the newer ES6 module (ESM) format. Tools like Rollup can easily convert ESM to other module formats, such as pure CommonJS for Node.js and UMD (a hybrid of CommonJS, AMD and browser global) for browsers. For these reasons, I strongly recommend generating the output as ESM instead. You can optionally generate UMD and pure CommonJS formats as well, by doing a secondary Rollup pass over the ESM version as an added service to the user.

Pain points and recommendations specific to the example JSON parser:

  • The JSON to test the parser on is included as a string in the run_json_parser script. This results in a double escape notation, which is potentially confusing for new users (among other things, they might wrongly conclude that the second backslash is the character being escaped or that Lark always requires all escapes to be doubly escaped). I recommend moving the example JSON into a separate file so that users see the text as Lark sees it "au naturel".

Pain points and recommendations specific to the example Python 3 parser:

  • The comment describing how to generate the example parser includes a path that is specific to your filesystem. While the grammar is available online, it is not included in the site-packages when installing Lark from PyPI. I strongly recommend including a copy of the grammar with Lark.js (as long as it is a separate project from Lark proper).
  • The BASE_DIR constant is hardcoded to a Windows-specific directory. I strongly recommend changing this to a project-relative directory. If incorporating Lark.js in Lark proper, you can pass over all Python modules in Lark, which should be more than enough to demonstrate the parser's capabilities. If keeping Lark.js separate, you can use the lark-js directory. While it contains only one small Python file, this has the advantage that you can print the parse tree without flooding the terminal.
  • Trying node run_python_parser.js, with the BASE_DIR changed to ../lark-js/ and an added line to print the parse tree to the console, resulted in a SyntaxError, see below. My obvious suggestion is to fix it.
node run_python_parser.js
__main__.py 2901
Lark.js/examples/python_parser.js:2912
        throw e;
        ^

SyntaxError: Invalid regular expression: /(?:(?i:\d+j)|(?i:((\d+\.[\d_]*|\.[\d_]+)(e[-+]?\d+)?|\d+(e[-+]?\d+)))(?i:j))/: Invalid group
    at new RegExp (<anonymous>)
    at Object.compile (Lark.js/examples/python_parser.js:41:12)
    at _get_match (Lark.js/examples/python_parser.js:47:17)
    at _create_unless (Lark.js/examples/python_parser.js:1832:17)
    at TraditionalLexer._build_scanner (Lark.js/examples/python_parser.js:1984:34)
    at TraditionalLexer.get scanner [as scanner] (Lark.js/examples/python_parser.js:2014:12)
    at TraditionalLexer.match (Lark.js/examples/python_parser.js:2021:17)
    at TraditionalLexer.next_token (Lark.js/examples/python_parser.js:2042:18)
    at ContextualLexer.lex (Lark.js/examples/python_parser.js:2168:21)
    at lex.next (<anonymous>)
@erezsh
Copy link
Member

erezsh commented Aug 6, 2021

Thanks!

I agree with all your comments.

For these reasons, I strongly recommend generating the output as ESM instead. You can optionally generate UMD and pure CommonJS formats as well, by doing a secondary Rollup pass over the ESM version as an added service to the user.

Please expand on this point. What changes do I need to make?

@jgonggrijp
Copy link
Author

jgonggrijp commented Aug 6, 2021

Instead of

function load_parser(option = {}) {/*...*/}

//...

module.exports = {
    //...,
    load_parser,
};

you do

export function load_parser(options = {}) {/*...*/}

The optional secondary Rollup pass is a bit involved, as it requires installing npm modules. This could be deferred to the user.

@erezsh
Copy link
Member

erezsh commented Aug 6, 2021

I see. Is there a short-hand to export all the symbols at once?

@jgonggrijp
Copy link
Author

Yes, you can also do

export {
    //...
    load_parser,
};

@thekevinscott
Copy link
Contributor

I think these are some great comments. I would in particular like to second the request for ESM support, which I think is as simple as updating the export syntax to ESM (that said, I've no idea if other Lark.js consumers expect CJS output, so I don't know how disruptive this change would be).

I don't think it's necessary to provide different flavors of the output file (CJS / UMD) as most of the ecosystem supports ESM natively, and for those that don't transforming is commonly an end-user requirement.

Two other pain points I'd throw out there:

  • I would love if the parser were available in pure Javascript. I've no idea what the lift on this ask would be, so I don't know how feasible it is. My use case is .lark -> get_parser in the browser, for example in an online IDE (i.e., something like this). I explored a prototype that leverages pyodide, but it would be wonderful to skip python and have a pure JS solution
  • I would love Typescript support, specifically a generated .d.ts file alongside the generated parser file. In order to support this, I think lark.js would need to be rewritten as a Typescript file, and the tsc typescript compiler would need to be added as a dependency when generating the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants