-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code review #2
Open
jgonggrijp
wants to merge
11
commits into
lark-parser:master
Choose a base branch
from
jgonggrijp:code-review
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Code review #2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment also applies to the generated parser itself, but I'm illustrating with the JSON example runner because all the variations are visible within a short section of code: (1) const/let, (2) var and (3) the absence of any such keyword. Mixing (1) and (2) is frowned upon; it makes the scope of variables unpredictable. (1) is newer, hence often considered better. In this case, however, I would suggest that you consistently go with (2), at least in the generated parser module. There are two reasons for my recommendation: 1. var has roughly the same scope semantics as local variables in Python. Since you are transpiling from Python, this makes it easier to generate reliable code. 2. var is older, hence portable to more different JS environments. (3), while allowed, should be avoided at all cost, as it always creates global variables (which is not the same as var at module scope). The fact that this form is allowed at all should be considered a historical artifact.
You can often get away without the semicolons, but there are corner cases where it can cause the JS interpreter/compiler to do the wrong thing, especially after inadvertent subsequent edits. It is generally considered better style to finish every statement with a semicolon AND a line end. This also speeds up JS parsing in some cases. This comment only applies to the example runner; it appears that the generated parser consistently includes the semicolons. Nevertheless, including them in your example code makes a better impression and also reduces the probability that users will accidentally break the code through editing.
JS users are used to dense code, with only single blank lines between blocks or occasionally inside long blocks. You can stretch this a bit by using double blank lines before classes and standalone functions like in PEP8, but you will have to be very consistent so that it is clear what convention you're maintaining. JS users will get irritated seeing a blank line before a closing semicolon or a blank line separating a single line of code from the rest of a function body.
Using multiline strings as documentation is not a thing in JS; you are expected to always use comments, which should precede the thing they document. The style I've demonstrated in the commit could be broadly referred to as a literate commenting style, which is for example practiced in Underscore and related libraries. It can be read as-is together with the code, or it can be used to present the comments and the code side by side, as illustrated below. Needless to say, this necessitates writing a separate manual by hand (second link). https://underscorejs.org/docs/underscore-esm.html https://underscorejs.org A possible alternative style, which is very popular, is the jsdoc format: /** * Short description * Possibly longer explanation with examples * @param name description * @returns description */ This is more geared to automatically generating reference documentation that is separate from the code. In order for such separate documentation to be useful, the comments need to be quite extensive, which can render the code itself less readable. It should also be noted that the generated reference documentation is generally still not sufficient to completely replace a purpose-written manual. https://github.com/lodash/lodash/blob/c84fe82760fb2d3e03a63379b297a1cc1a2fce12/lodash.js#L459
Two-space indents are often considered a convention in JavaScript. Even in the JS community, however, there is sizable subpopulation that appreciates larger indents for better readability. Ultimately, it is important not to do what is most popular but what you believe is best. If you subscribe to PEP8's four-space indent recommendation, there is no reason not to apply it in JS as well. (The reason two-space indents became fashion in JS is also outdated. When JS programmers were still writing module wrappers by hand and there was no async/await syntax yet, indentation tended to go very deep very quickly. This produced an incentive to sacrifice readability in exchange for needing to wrap lines less often. There is no longer a need to make this sacrifice.)
JS users expect the word "polyfill" to mean something very specific, i.e., code that emulates a builtin feature of *JavaScript* if it is not available by an outdated target environment. Using the word differently might cause confusion.
When possible, avoid abbreviated names because they cause cognitive strain. Highly idiomatic single-letter variables such as i/j/k in a loop can be acceptable. An additional remark on this snippet: it appears that the re_ parameter is just an alias of the Python re module emulation a few lines up. Avoid this pattern if possible; using re directly will make the code much easier to understand.
JavaScript's modern iterator support is of course convenient for transpiling Python generators. Unfortunately, they do have a performance cost. for-of loops also work on plain arrays, so when it is possible to replace a generator function by a normal function that returns a plain array, it is likely that this will make your parser faster. for-of loops on arrays, in turn, should be replaced by old-fashioned for-;; loops for raw performance. The for-of construct involves converting the array to iterator first through the Array[@@iterator] method and then calling iterator#next on each loop iteration, inducing considerable function call overhead. Of course, these considerations only apply to hot loops. An added bonus of reducing iterators and for-of constructs is that compatibility transpilers like Babel will produce less boilerplate code.
Introducing a library dependency like Underscore allows you to bring back much of the notational elegance that Python brings through itertools and functools, plus some JavaScript-specific extras. Pros: - Shorter, more elegant, more maintainable code. - Removes the need for many of the generic functions that you have written in lines 87-333. - Completely portable; removes need for many constructs that cause compatibility transpilers to generate boilerplate code or introduce polyfills. Cons: - Performance penalty is similar to for-of due to function call overhead. Nothing beats the raw performance of bare for-;; loops. - Having a dependency is perceived as a burden in the JS community. Underscore is however sufficiently small that its added weight can be completely offset by the reduced code size, especially after compatibility transpilation. - No support for iterators (although this can be easily patched). This commit serves to illustrate how Underscore could be used. Some of the code changes might be better left out. In general, I recommend to use Underscore where possible and for-;; where necessary. For completeness, I will mention that I maintain Underscore and that there are alternatives such as Lodash and Ramda.
Please ignore this commit.
Could use some polishing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The purpose of this PR is not to actually merge the changes, but to illustrate my comments with code. It is a work in progress. For most convenient consumption, I suggest reading the commits one by one.