Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix long JSON preprocessing #36

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

davidalejandroaguilar
Copy link

@davidalejandroaguilar davidalejandroaguilar commented Jan 19, 2025

Description

This PR:

  • Fixes long JSON preprocessing.
    • The current implementation is clearing the stack too aggressively when it finds a JSON object, which causes it to lose partial JSON data that should be kept for the next object. This is particularly noticeable when parsing larger JSON data.
  • Adds FixtureHelpers to spec/support which are automatically loaded in specs, so we can compare fixtures to expected results more easily.

I tried to follow the existing code style, let me know if you'd like to me clean something up.

Example

Previous

When the stack is:

  {
    "title": "Functional Programming Perfection",
    "tweet": "Embrace Ruby\'s functional programming capabilities and write concise, expressive, and maintainable code. #RubyFP #LessCodeMoreFun"
  },
  {

The parsed JSON object is:

{"title"=>"Functional Programming Perfection", "tweet"=>"Embrace Ruby's functional programming capabilities and write concise, expressive, and maintainable code. #RubyFP #LessCodeMoreFun"}

But the resulting stack is:

    "title": "Object-Oriente

So as the stack starts filling up again:

    "title": "Object-Oriented Onslaught",
    "tweet": "Master the art of object-oriented programming in Ruby and create modular, reusable, and scalable software. #RubyOOP #CodeArchitecture"
  },

We'll get an error parsing it:

Anthropic JSON Error (spotted in ruby-anthropic 0.3.2): undefined method `[]' for nil

Now

When the stack is:

 {
    "title": "Functional Programming Perfection",
    "tweet": "Embrace Ruby\'s functional programming capabilities and write concise, expressive, and maintainable code. #RubyFP #LessCodeMoreFun"
  },
  {

The parsed JSON object is:

{"title"=>"Functional Programming Perfection", "tweet"=>"Embrace Ruby's functional programming capabilities and write concise, expressive, and maintainable code. #RubyFP #LessCodeMoreFun"}

And the resulting stack is:

{
    "title": "Object-Oriente

So as the stack starts filling up again:

{
    "title": "Object-Oriented Onslaught",
    "tweet": "Master the art of object-oriented programming in Ruby and create modular, reusable, and scalable software. #RubyOOP #CodeArchitecture"
  },

We'll be able to get the next JSON object:

{"title"=>"Object-Oriented Onslaught", "tweet"=>"Master the art of object-oriented programming in Ruby and create modular, reusable, and scalable software. #RubyOOP #CodeArchitecture"}

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?

@alexrudall
Copy link
Owner

Thanks, @davidalejandroaguilar - is there a way this could be a breaking change? I can't imagine it breaking anyone's existing use of the gem, should just make it more reliable basically?

@davidalejandroaguilar
Copy link
Author

davidalejandroaguilar commented Jan 20, 2025

@alexrudall Appreciate you taking a look.

I also don't think this would be a breaking change, it should indeed make it more reliable.

However, I can add more specs to cover more edge cases, such as an empty JSON, a JSON which is not enclosed in an array, etc. if you're OK with additional VCRs (kinda followed the existing style, didn't want to go overboard). I'm OK paying for the API requests.

@davidalejandroaguilar davidalejandroaguilar force-pushed the fix-long-json-preprocessing branch from 9a80bc6 to 3e12417 Compare January 21, 2025 00:15
@davidalejandroaguilar davidalejandroaguilar force-pushed the fix-long-json-preprocessing branch from 8a25cdc to 6f9a22c Compare January 21, 2025 00:22
@davidalejandroaguilar
Copy link
Author

@alexrudall Added additional specs, these are the specs we'd have:

  1. Pre-processing small JSON array.
  2. Pre-processing large JSON array.
  3. Pre-processing empty JSON array.
  4. Pre-processing single small JSON object.
  5. Pre-processing single large JSON object.
  6. Pre-processing single empty JSON object.

  • With the old approach, number 2 is the only one that fails.
  • With the new approach, all succeed.

Let me know if you can think of additional edge cases, but I think we should be OK releasing this as a non-breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants