Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize std.findSubstr and std.flattenArrays #216

Merged

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented Nov 22, 2024

This PR implements small performance optimizations for two built-in functions:

  • std.findSubstr:
    • Skip array allocation in case of no matches. This is a common scenario because findSubstr is often used to implement "string contains" checks.
    • Use ArrayBuilder instead of ArrayBuffer: in this context we only need to append and don't need to update, remove, or inspect the elements that we've written, and for those requirements ArrayBuilder is cheaper to instantiate and cheaper to invoke.
    • Store Val.Num into the array builder rather than storing indices then mapping the result into an output array: this saves an array allocation and iteration.
  • std.flattenArrays:
    • Apply a similar ArrayBuffer -> ArrayBuilder replacement.

@stephenamar-db
Copy link
Collaborator

Do you want to merge this?

@JoshRosen JoshRosen changed the title Optimize std.findSubstr and std.flattenArrays Optimize std.findSubstr Dec 5, 2024
@JoshRosen
Copy link
Contributor Author

Hmm, it looks like we're getting a compilation error in the scalajs 2.12 code because we're missing ArrayBuilder.addAll:

[info] compiling 31 Scala sources to /home/runner/work/sjsonnet/sjsonnet/out/sjsonnet/2.12.18/js/compile.dest/classes ...
Error:  /home/runner/work/sjsonnet/sjsonnet/sjsonnet/src/sjsonnet/Std.scala:470:34: value addAll is not a member of scala.collection.mutable.ArrayBuilder.ofRef[sjsonnet.Lazy]
Error:            case v: Val.Arr => out.addAll(v.asLazyArray)
Error:                                   ^
Error:  one error found

I've gone ahead and backed out the flattenArrays optimization and have updated this PR title and description to only reflect the findSubstr optimizations (which are probably the more impactful of the two, to be honest).

I still think we should merge that optimization because it's a cheap win and actually was a hotspot (relatively speaking) in some files I profiled.

@stephenamar-db
Copy link
Collaborator

Hmm, it looks like we're getting a compilation error in the scalajs 2.12 code because we're missing ArrayBuilder.addAll:

[info] compiling 31 Scala sources to /home/runner/work/sjsonnet/sjsonnet/out/sjsonnet/2.12.18/js/compile.dest/classes ...
Error:  /home/runner/work/sjsonnet/sjsonnet/sjsonnet/src/sjsonnet/Std.scala:470:34: value addAll is not a member of scala.collection.mutable.ArrayBuilder.ofRef[sjsonnet.Lazy]
Error:            case v: Val.Arr => out.addAll(v.asLazyArray)
Error:                                   ^
Error:  one error found

I've gone ahead and backed out the flattenArrays optimization and have updated this PR title and description to only reflect the findSubstr optimizations (which are probably the more impactful of the two, to be honest).

I still think we should merge that optimization because it's a cheap win and actually was a hotspot (relatively speaking) in some files I profiled.

I believe you can use the ++= operator in-lieu of addAll

@JoshRosen JoshRosen changed the title Optimize std.findSubstr Optimize std.findSubstr and std.flattenArrays Dec 5, 2024
@JoshRosen
Copy link
Contributor Author

I believe you can use the ++= operator in-lieu of addAll

Ah, good call: it looks like that's a synonym for an addAll in 2.13 and in both 2.12 and 2.13 it looks like it has fast-path optimizations for the case where you're adding from an array. Those optimizations are implemented slightly differently in 2.12 vs. 2.13 but I believe they'll both have the same net effect of bottoming out in a System.arrayCopy.

I've gone ahead and incorporated this suggestion, adding ++= and reverting back to the old PR title and description.

@stephenamar-db stephenamar-db self-requested a review December 5, 2024 19:58
@stephenamar-db stephenamar-db merged commit ea6ab85 into databricks:master Dec 5, 2024
1 check passed
@JoshRosen JoshRosen deleted the optimizations-to-avoid-ArrayBuffer branch December 5, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants