Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize lstripChars, rstripChars, and stripChars via specialization #236

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented Dec 12, 2024

This PR optimizes lstripChars / rstripChars / stripChars built-in functions by using the specialization framework from #119 / 0bd255a to pre-compile and re-use Pattern instances when the replacement / strip arguments are constants.

In Java, String.replaceAll() compiles and uses a Pattern under the hood and this is relatively expensive; specialization lets us save this cost when the chars-to-be-stripped are constant.

Testing

Correctness: ran all tests with a manual change to disable static function application optimizations (a prerequisite required to achieve full test coverage, as the static application prevents specialization from kicking in during most tests). In a followup, I think we should explore adding flags for optimizer features and running all tests with/without optimization.

Performance: ran benchmarks on a large file and with this change I save ~11% of allocated bytes and ~8% of wall time. I ran that same file through RunProfiler and saw a large speedup in stripChars, costing ~619ns/call before and ~154ns/call after.

@JoshRosen JoshRosen changed the title Optimize strReplace, lstripChars, rstripChars, and stripChars via specialization Optimize lstripChars, rstripChars, and stripChars via specialization Dec 12, 2024
@JoshRosen
Copy link
Contributor Author

I considered specializing strReplace but doing so doesn't help (and is actually slower) on JDK 9+ because https://bugs.openjdk.org/browse/JDK-8058779 optimized java.lang.String.replace(CharSequence, CharSequence).

@stephenamar-db stephenamar-db merged commit f5bf391 into databricks:master Dec 12, 2024
6 checks passed
@JoshRosen JoshRosen deleted the specialize-more-builtins branch December 31, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants