Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance of version 1.7.14 has gone backwards #1365

Open
SevenNight2012 opened this issue Aug 8, 2023 · 35 comments
Open

The performance of version 1.7.14 has gone backwards #1365

SevenNight2012 opened this issue Aug 8, 2023 · 35 comments
Labels
Android Issues related to running Rhino on Android

Comments

@SevenNight2012
Copy link

I am using rhino to calculate a set of float data

when using version 1.7.14, the calculation time is as follows:

image

and when using version 1.7.12,the calculation time is as follows:

rhino_performance

As far as I can see, the 1.7.12 version seems to perform better, is there something wrong with the 1.7.14 version? Or is there any special attention to the use of version 1.7.14?

My equipment and environment:
Rhino used in android device:Google pixel 3XL
Mac book Pro 2022,M2 CPU,16GB memory,OpenJDK 8

@SevenNight2012
Copy link
Author

Oh well, I found out that 1.7.13 works the same as 1.7.12, but it doesn't work as well since 1.7.14-RC1,all the code in the project has not changed, I just replaced the Rhino-runtime dependency, like this

rootProject.subprojects.forEach {
    it.configurations.all {
        resolutionStrategy {
            force "org.mozilla:rhino-runtime:1.7.14-RC1"
            // force "org.mozilla:rhino-runtime:1.7.13"
        }
    }
}

Does anyone have a similar problem? Does the author have any good suggestions? @p-bakker

@tuchida
Copy link
Contributor

tuchida commented Aug 9, 2023

If you're running it on Android, do you have the optimizationLevel set to -1?
Perhaps BigInt implementation may have something to do with it. ref. #837

@rbri
Copy link
Collaborator

rbri commented Aug 9, 2023

Any chance to share the whole code of the calc() function to give us a chance to profile?

@SevenNight2012
Copy link
Author

If you're running it on Android, do you have the optimizationLevel set to -1? Perhaps BigInt implementation may have something to do with it. ref. #837

Yes,I have set it like this:

        Context jsContext = new RhinoAndroidHelper(ContextHolder.getContext()).enterContext();
        // Context jsContext = Context.enter();
        jsContext.setOptimizationLevel(-1);
        jsContext.setLanguageVersion(Context.VERSION_ES6);
        jsContext.setLocale(Locale.getDefault());
        jsContext.setWrapFactory(new WrapFactory());
        ImporterTopLevel scope = new ImporterTopLevel();
        scope.initStandardObjects(Context.getCurrentContext(), false);

@SevenNight2012
Copy link
Author

SevenNight2012 commented Aug 9, 2023

Any chance to share the whole code of the calc() function to give us a chance to profile?

No problem, but this calc method is very complicated internally, this method is used to calculate astronomical related data
cal11Arc.js

@SevenNight2012
Copy link
Author

Any chance to share the whole code of the calc() function to give us a chance to profile?

This js file is very large, and the internal calculation is very complicated, maybe we only need a complex calculation to simulate it

@tuchida
Copy link
Contributor

tuchida commented Aug 9, 2023

I tried to run your getXingdu on my PC. It is true that 1.7.14 seems to be a little slower, but it didn't seem to make much difference.

$ java -jar buildGradle/libs/rhino-1.7.14.jar -version 200 -opt -1 xingdu.js 
time-->  95
$ java -jar buildGradle/libs/rhino-1.7.13.jar -version 200 -opt -1 xingdu.js 
time-->  92
$ java -jar buildGradle/libs/rhino-1.7.12.jar -version 200 -opt -1 xingdu.js 
time-->  94

These values is just an example. In 1.7.14 it went from about 100 to 120, and in 1.7.12 and 1.7.13 it went from about 90 to 100. Perhaps it is different in the Android environment.

@SevenNight2012
Copy link
Author

I tried to run your getXingdu on my PC. It is true that 1.7.14 seems to be a little slower, but it didn't seem to make much difference.

$ java -jar buildGradle/libs/rhino-1.7.14.jar -version 200 -opt -1 xingdu.js 
time-->  95
$ java -jar buildGradle/libs/rhino-1.7.13.jar -version 200 -opt -1 xingdu.js 
time-->  92
$ java -jar buildGradle/libs/rhino-1.7.12.jar -version 200 -opt -1 xingdu.js 
time-->  94

These values is just an example. In 1.7.14 it went from about 100 to 120, and in 1.7.12 and 1.7.13 it went from about 90 to 100. Perhaps it is different in the Android environment.

Well, thank you very much, but, in my opinion, PC devices are more powerful than mobile phones, so the difference is not obvious.

Here are some of my guesses:

  • When using version 1.7.13,it also takes a long time to start, but then runs very fast
  • When using version 1.7.14,its performance is basically not improved
  • Is there any caching in version 1.7.13 that causes it to perform well in later runs?
  • Is there any difference between the use of version 1.7.13 and version 1.7.14, resulting in inconsistent performance of the engine?

@gbrail
Copy link
Collaborator

gbrail commented Aug 11, 2023

I recall having to add some synchronization around a global DateFormat object, since DateFormat is not actually thread-safe. It'd be interesting to look at the NativeDate class and see if any of that synchronization is contributing to bad performance, and if we could address it by not trying to have a global object.

@SevenNight2012
Copy link
Author

function b(e, n, a) {
  var t,
    o,
    r,
    f,
    i,
    s = n / 36525,
    c = q(s), //在月球轨道上经纬度?
    u = c[0],
    g = c[1],
    h = A(s) + g,
    l = "";
  if (e < 10) {
    o = ie(0, s, -1, -1, -1);
    t = ie(e, s, -1, -1, -1);
    t[0] = N(t[0]);
    t = J(t, o);
    i = t[2];
    s -= i * P;
    f = ie(0, s, -1, -1, -1);
    r = ie(e, s, -1, -1, -1);
    t = J(r, o);
    t = J(r, f);
    t[0] = N(t[0] + u);
    if (0 == a) {
      t = Y(t, h);
      l = L(t[0]);
    }
    if (a == 1) {
      l = L(t[0]);
    }
  }
  if (e == 10) {
    t = ce(s, 1, 1, -1);
    i = t[2];
    s -= (i * P) / C;
    t = ce(s, -1, -1, -1);
    t[0] = N(t[0] + u);
    if (a == 0) {
      t = Y(t, h);
      l = L(t[0]);
    }
    if (a == 1) {
      l = L(t[0]);
    }
  }

  return l;
}

Well, these days, I add time-consuming statistics before and after each line of code in getXingdu,and i seem to have found something. Most of the time is spent on this method, but in the v1.7.13 version, it will be very fast after multiple calls. Does anyone know why? @tuchida @gbrail @rbri
Or can we check the difference between version 1.7.14 and version 1.7.13?

@tuchida
Copy link
Contributor

tuchida commented Aug 17, 2023

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

@SevenNight2012
Copy link
Author

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

Ok, this file is really big, here is the gist address
cal11Arc.js

@tuchida
Copy link
Contributor

tuchida commented Aug 17, 2023

Thanks. Can the code for this comment be deleted? #1365 (comment)

@SevenNight2012
Copy link
Author

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

I need to explain the general function of this file, which is to calculate the position of the sun, moon and other stars on the earth's ecliptic according to time~
There's a lot of computation involved inside it, so this might be extreme, but it should be exciting to optimize it!

@SevenNight2012
Copy link
Author

Thanks. Can the code for this comment be deleted? #1365 (comment)

Okay,please wait~

@SevenNight2012
Copy link
Author

Thanks. Can the code for this comment be deleted? #1365 (comment)

Look at it now!I've dealt with that comment

@tuchida
Copy link
Contributor

tuchida commented Aug 17, 2023

The only cause I can think of so far is the implementation of BigInt (ref. #837). However, when I implemented this, I measured micro-benchmarks and even a small change in the code changed the results significantly. Quantitative measurement is difficult because of the possible influence of the optimization mechanism of the JVM.
If you continue to investigate, for example, see if performance changes before and after #837 is merged, that might tell you something.

@SevenNight2012
Copy link
Author

The only cause I can think of so far is the implementation of BigInt (ref. #837). However, when I implemented this, I measured micro-benchmarks and even a small change in the code changed the results significantly. Quantitative measurement is difficult because of the possible influence of the optimization mechanism of the JVM. If you continue to investigate, for example, see if performance changes before and after #837 is merged, that might tell you something.

Thank you very much, I think the BigInt you mentioned is indeed possible, because in fact, I also found that the doArithmetic method of the Interpreter class is obviously time-consuming in version 1.7.14, but this does not seem to be the root cause
I'll try to keep checking~

@SevenNight2012
Copy link
Author

SevenNight2012 commented Aug 18, 2023

Hi, everyone, I have located this problem, it started from the commit id 67187cba
Can someone take a look at the issues in this commit? @tuchida @gbrail @rbri @p-bakker

@tuchida
Copy link
Contributor

tuchida commented Aug 18, 2023

I disassembled Interpreter.class in 1.7.13 and 1.7.14 jar files.

$ javap -v -p ./Interpreter.class > ./Interpreter.dump

The switch statements changed in the commit you found were both tableswitch.

# 1.7.13
       254: tableswitch   { // -66 to 157
# 1.7.14
       266: tableswitch   { // -74 to 160

I am not very familiar with JVM. Based on that I have measured that tableswitch seems to change the performance of certain cases depending on the number and order of the cases.
It may be possible to speed up a particular case in some way. But I do hope the JVM optimization will improve.


In Dalvik, tableswitch seems to be equivalent to packed-switch. If you read the bytecode that Dalvik is running, you might learn something.

@p-bakker
Copy link
Collaborator

@SevenNight2012 the commit you referred to adds support for template literals.

Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings.

So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance.

Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong)

@SevenNight2012
Copy link
Author

@SevenNight2012 the commit you referred to adds support for template literals.

Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings.

So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance.

Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong)

image

Well, I built a lot of jars to find the problem, maybe there is really something in this commit~
I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together?

@gbrail
Copy link
Collaborator

gbrail commented Aug 19, 2023 via email

@SevenNight2012
Copy link
Author

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves?

On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves?

On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Well, actually, I'm running rhino on an Android device, and I'm measuring its performance based on how time-consuming the methods are. This may not be comprehensive, but it should be intuitive. I'll show you an Android demo later.

@SevenNight2012
Copy link
Author

SevenNight2012 commented Aug 21, 2023

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves?

On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Hi, I uploaded my local rhino demo to my github, you can take a look, this is an Android demo, maybe you need to download AndroidStudio
RhinoDemo

@p-bakker p-bakker added the Android Issues related to running Rhino on Android label Oct 16, 2023
@821938089
Copy link

This is because the JIT compilation of the interpretLoop method fails, resulting in performance degradation.

image

21:52:58.834 c.android.rhino          I  Compiling method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline
21:52:58.834 c.android.rhino          I  Compilation of java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) took 468us
21:52:58.834 c.android.rhino          I  Failed to compile method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline

@SevenNight2012
Copy link
Author

This is because the JIT compilation of the interpretLoop method fails, resulting in performance degradation.

image

21:52:58.834 c.android.rhino          I  Compiling method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline
21:52:58.834 c.android.rhino          I  Compilation of java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) took 468us
21:52:58.834 c.android.rhino          I  Failed to compile method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline

Wow,thank you very much~

@SevenNight2012
Copy link
Author

@gbrail @p-bakker
Hey, I think I found the problem.
Firstly,i mentioned before that there is a significant performance difference before and after 67187cba submission (based on Android platform),and combined with the tips he(@821938089 ) gave,i commented out the code about Icode_QUASI_CALLSITE, which is located in the interpretLoop method of Interpreter,in the end, rhino performed well!
Here is my code:
image
image

Secondly,i try to imitate what RegExp does,below is my code
image
image
image
Based on what I did in point 2, the performance is good and compatible with the original code.

In addition,what I did above is only to enable rhino to execute JIT on the Android platform. I am not sure whether it can also improve performance on other platforms.

Lastly,as we can see in the second image, compiler allocated 7392KB to compile the interpretLoop method.
The interpretLoop method is too large. Maybe we should consider optimizing this method.
Thank you for your attention to this problem. I hope rhino will get better and better~~

@p-bakker
Copy link
Collaborator

Sry for not responding to your findings sooner

Have a bit of a hard time what your findings are. Is this summary correct?

  • adding a QuasiCallSiteProxy to the code similarly to how Rhino does that for RegExp improves performance?
  • the interpretLoop method is large

If that is correct, would you at least be able to create a PR for the QuasiCallSiteProxy change?

Not sure what can be done about the interpretLoop method though

@p-bakker
Copy link
Collaborator

Btw: I would think that the QuasiCallsite code would only get hit if you're code uses backticks, which, prior to 1.7.14 were just (incorrectly) considered regular string delimiters, but are now (properly) interpreted as Template String Literals

As for the too long InterpretLoop method: seems like the resolution would be to extract some logic from the method into helper functions. Haven't looked at potential candidates, but a PR is always welcome

@p-bakker
Copy link
Collaborator

@rbri could you comment on the getQuasiCallSiteProxy suggestion/approach? I saw you did stuff related to the getRegexProxy fairly recently, so am hoping you might know the purpose of these 'proxies' and whether it makes sense to add one for quasi callsites as well?

@rbri
Copy link
Collaborator

rbri commented Aug 18, 2024

@rbri could you comment on the getQuasiCallSiteProxy suggestion/approach?

Note: this quasi stuff was renamed into template in commit f797ac8

Regarding the performance - i have no idea why commenting out that code will improve the performance; at least i fear some template constructs are no longer working after that change.

And now some words about the regex proxy thingy....

The overall idea is to make the whole RegExp implementation replacable - but this is not so simple like for other classes. The problem is the /regex/ syntax - this is handled by the parser. Therefore we have this strange factory-like pattern that supports

  1. a more or less js independent regexp implementation that works as backend
  2. enough support for the parser to be able to create js objects that are supported by the backend
  3. some support for the replacement of the whole js part of the regexp support.

If you like to have a sample - HtmlUnit replaces only the backend (by converting the regex into a java regex) but still uses the js regex part from rhino.

And if someday someone merges #1419 i can try to implement a whole new regexp implementation based on a more advanced regexp engine (unicode flag and all the other missing stuff) that replaced the backend and the js part.

But i have absolutely no glue why the same approach should work in this case.

@rbri
Copy link
Collaborator

rbri commented Aug 18, 2024

And one more idea: maybe we can improve the template handling in a way that we introduce a separate ast node if a template is a simple (multiline) template (no replacements inside). In this case we can simplify the processing to the same as we do for strings, only the decompiler has to use the ` instead of the ".
My guess is more and more js stuff uses this as easy way to define strings with or without line breaks and we can simplify our processing for that.

@gbrail @p-bakker @tuchida @tonygermano - what do you think?

@SevenNight2012
Copy link
Author

Btw: I would think that the QuasiCallsite code would only get hit if you're code uses backticks, which, prior to 1.7.14 were just (incorrectly) considered regular string delimiters, but are now (properly) interpreted as Template String Literals

As for the too long InterpretLoop method: seems like the resolution would be to extract some logic from the method into helper functions. Haven't looked at potential candidates, but a PR is always welcome

Hi, here are my guesses
First of all, I am using rhino on the Android platform, so performance-related issues will be amplified.
Secondly, you can see from the screenshot that the system keeps prompting JIT to allocate memory. I extracted the code of Icode_QUASI_CALLSITE. I actually wanted to isolate it from the interpretLoop function, so that the virtual machine can optimize the code as much as possible when running. Facts have proved that, This seems to work
Finally, it is precisely because of the second point that I recommend optimizing interpretLoop
I'm not familiar with Rhino, so I can only locate the problem, and I don't seem to be sure what's causing it.
Thank you for your continued attention to this issue @p-bakker @rbri

@rbri
Copy link
Collaborator

rbri commented Sep 10, 2024

Will have another look at that after my vacation...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Android Issues related to running Rhino on Android
Projects
None yet
Development

No branches or pull requests

6 participants