Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance(stdlib,value): Faster JSON parsing #1249

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JakubOnderka
Copy link

Summary

This merge requests speeds up JSON parsing when lossy is set to true:

  • it avoids double validation if bytes are valid UTF-8 string by using specific method from serde-json
  • uses simdutf8 crate for faster UTF-8 validation
    • this crate is already included when default features are on
    • string validation by this crate is 4-11 times faster than Rust stdlib validation
    • converting to lossy string can be slower in case string is not UTF-8 valid (it should be very rare)

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Standard tests

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on
    our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Our CONTRIBUTING.md is a good starting place.
  • If this PR introduces changes to LICENSE-3rdparty.csv, please
    run dd-rust-license-tool write and commit the changes. More details here.
  • For new VRL functions, please also create a sibling PR in Vector to document the new function.

@JakubOnderka JakubOnderka force-pushed the json-parse-optimisation branch 3 times, most recently from 0699755 to c818790 Compare February 3, 2025 11:45
@JakubOnderka JakubOnderka force-pushed the json-parse-optimisation branch from c818790 to 155d8b0 Compare February 3, 2025 12:04
@JakubOnderka JakubOnderka changed the title chore(stdlib,compiler): Faster JSON parsing chore(stdlib,value): Faster JSON parsing Feb 3, 2025
@pront pront changed the title chore(stdlib,value): Faster JSON parsing performance(stdlib,value): Faster JSON parsing Feb 3, 2025
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @JakubOnderka. Some CI checks are failing, you can iterate locally with ./scripts/checks.sh. And we will also need a changelog fragment.

/// Converts a slice of bytes to a string, including invalid characters.
#[must_use]
pub fn simdutf_bytes_utf8_lossy(v: &[u8]) -> Cow<'_, str> {
simdutf8::basic::from_utf8(v).map_or_else(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question that just came up, did you consider encoding_rs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants