Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated JS parser fails if minified #17

Open
cderossi opened this issue Mar 13, 2022 · 8 comments
Open

Generated JS parser fails if minified #17

cderossi opened this issue Mar 13, 2022 · 8 comments

Comments

@cderossi
Copy link

I'm including a generated parser.js as part of a large web project. Webpack with the Terser plugin is configured to minify all of the javascript. The default (I believe) behavior is to rename all non-top-level classes to one or two-letter names.

But serialized data, like this:

  0: {
    name: 'NUMBER',
    pattern: {
      value:
        '(?:(?:(?:[0-9])+(?:e|E)(?:(?:\\+|\\-))?(?:[0-9])+|(?:(?:[0-9])+\\.(?:(?:[0-9])+)?|\\.(?:[0-9])+)(?:(?:e|E)(?:(?:\\+|\\-))?(?:[0-9])+)?)|(?:[0-9])+)',
      flags: [],
      _width: [1, 4294967295],
      __type__: 'PatternRE',
    },
    priority: 0,
    __type__: 'TerminalDef',
  },

includes names of classes as string literals.

When the classes are renamed, the data can no longer be deserialized and an exception is thrown.

Can a dictionary be built that maps the serialized type names to classes without relying on the class names themselves?

@erezsh
Copy link
Member

erezsh commented Mar 14, 2022

Strange, it should already work like that -

In each subclass of Serialize, the __serialize_namespace__ method returns the class object, which are the provided as a namespace map for _deserialize, which then performs:

    if ("__type__" in data) {
      // Object
      class_ = namespace[data["__type__"]];
      return class_.deserialize(data, memo);

Sounds like it fails somewhere along the line, but I can't say where just from your description.

@cderossi
Copy link
Author

There are a couple things happening, from what I can tell--

For example, take the serialized NUMBER object I included above.

The TerminalDef class gets renamed during webpack's optimization pass, in my case to Xr. So far, this is not so bad because

  static get __serialize_namespace__() {
    return [TerminalDef]
  }

gets turned into

{
    key: "__serialize_namespace__",
    get: function() {
        return [Xr]
}

But the constructor for class Xr also gets renamed to r. In fact, all constructors get renamed to r. Since class.name uses the name of the constructor

    namespace = Object.fromEntries(namespace.map((c) => [c.name, c]))

winds up making namespace into

{
  r : Xr
}

Since data["__type__"] == "TerminalDef" and there is no namespace.TerminalDef, deserialization can't proceed.

Even worse, since all constructors are renamed to r, if the __serialize_namespace__ has more than one entry, they collide and the resulting namespace object only winds up with a single field.

@erezsh
Copy link
Member

erezsh commented Mar 14, 2022

@cderossi Thanks for telling me about this bug.

I created a PR that I think would fix it. Let me know if it works for you

(#18)

@cderossi
Copy link
Author

@erezsh I tested your PR and it works great. That's an elegant solution.

Thanks for addressing this. I don't consider this your bug, since your un-minified code works and the minifier breaks it. But webpack is too common.

erezsh added a commit that referenced this issue Mar 14, 2022
Fix for issue #17: Renaming classes breaks deserialization
@erezsh
Copy link
Member

erezsh commented Mar 14, 2022

Great. It's now merged, and will be included in the next release.

You're right it's not really a bug, but code lives in an ecosystem, and should try to play nice with it, if it can.

I am concerned that maybe you'll encounter more errors, because there are other parts of the code that use constructor.name, for example

  let tokens_by_type = classify(terminals, (t) => t.pattern.constructor.name);

Which is supposed to distinguish between two different classes. But if the constructors are all renamed to the same name, it might create the wrong behavior.

@cderossi
Copy link
Author

I'm afraid you're right about more potential problems. I added some log statements to _create_unless and got this result:

image

I have not seen an actual error in my testing so far, though. Suggestions welcome.

erezsh added a commit that referenced this issue Mar 15, 2022
@erezsh
Copy link
Member

erezsh commented Mar 15, 2022

It's the mechanism that resolves collisions between keywords and variable names. It's not always easy to notice when it goes wrong.

Let me know if the new PR solves it - #20

@cderossi
Copy link
Author

cderossi commented Mar 15, 2022

Looks good in the console and my parser output still looks correct:

image

My grammar is too simple to have any possible collisions between keywords and variable names, so I may not be the best test case.

Thanks again for your work!

erezsh added a commit that referenced this issue Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants