Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: Failed to parse page JSON data: expected value at line 1 column 1 #301

Closed
1 task done
obvious-hugh-mann opened this issue Oct 30, 2024 · 40 comments · Fixed by #305
Closed
1 task done
Labels
bug Something isn't working

Comments

@obvious-hugh-mann
Copy link

obvious-hugh-mann commented Oct 30, 2024

Describe the bug

When attempting to access the site, it loads slowly and then shows an error.

Steps to reproduce the bug

Steps to reproduce the behavior:

  1. Enter the URL "redlib.privacyredirect.com" or "safereddit.com" or "lr.drgnz.club" in the URL bar, OR click on any link to one of those sites
  2. Wait for the page to load
  3. See error

What's the expected behavior?

The page should load correctly and show what it shows on Reddit.

Additional context / screenshot

Full text of the error when entering the bare URL: "Failed to parse page JSON data: expected value at line 1 column 1 | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL"

Full text of the error when entering r/cats: "Failed to parse page JSON data: expected value at line 1 column 1 | /r/cats/hot.json?&raw_json=1"

Full text of the error when entering "r/cats/comments/qms1es/yall_i_did_not_realize_how_affectionate_and/" : Failed to parse page JSON data: expected value at line 1 column 1 | /r/cats/comments/qms1es/yall_i_did_not_realize_how_affectionate_and/.json?&raw_json=1

redlib.privacyredirect.com is running the latest commit, but the other 2 instances are not

  • I checked that the instance that this was reported on is running the latest git commit, or I can reproduce it locally on the latest git commit
@obvious-hugh-mann obvious-hugh-mann added the bug Something isn't working label Oct 30, 2024
@obvious-hugh-mann obvious-hugh-mann changed the title 🐛 Bug Report: Failed to parse page JSON data: expected value at line 1 column 1 | /r/popular/hot.json? 🐛 Bug Report: Failed to parse page JSON data: expected value at line 1 column 1 Oct 30, 2024
@np22-jpg
Copy link
Contributor

Same issue here. My personal instance is running bc95308. That being said, this might just be a wave of IP bans by Reddit.

@AyoungDukie
Copy link

For reference, folks will either want to reopen the original pinned issue, or pin a new opened one to avoid a flurry of dupes.

But yes, seems like a new method of attempting to filter/ban access.

@Handrail9
Copy link

Im getting this on a personal 1 user instance

@rc2dev
Copy link

rc2dev commented Oct 30, 2024

Im getting this on a personal 1 user instance

Same here.

Plus, spinned up a Libreddit instance and got the same error.

@Owl-Tec
Copy link

Owl-Tec commented Oct 30, 2024

Same issue here on a personal instance. Reddit most likely changed something on their end again.

@r7l
Copy link

r7l commented Oct 30, 2024

Same here.

@gigirassy
Copy link

Yeah, my basic-auth instance at rl.blitzw.in gets the same error.
image

@vytskalt
Copy link

It's clear that this is a global issue. I don't think we should be posting these "same here" comments as they're just causing useless notifications for others.

@HairyMilkshakes
Copy link

probably getting rate limited again.

@notpushkin
Copy link

@HairyMilkshakes Don’t think that’s the case – rate limiting shouldn’t affect one user instances

@NovaCyntax
Copy link

Has nothing to do with rate limiting, happens every few months it seems. Generally a quick fix.

@toberoni
Copy link

Restarting the redlib docker container lets me access Reddit for 1-2mins (repeatable). After that the instance throws the error again.

@luutuyen2k9
Copy link

I have a same problem when trying to visit libreddit.freedit.eu
894_003

@arch-btw
Copy link

arch-btw commented Oct 31, 2024

I haven't looked at the code, but I might see part of the issue, notice this part of the error message:

hot.json?&raw_json=

There's an ? followed by the &, which isn't valid. It's just supposed to be only one of those, I forgot which one though.
The first parameter should start with a ? and then the subsequent parameters start with an &, never both.

So that might be resulting in an empty field right now:

payload

@sigaloid

@jimmydoh
Copy link

jimmydoh commented Oct 31, 2024

I haven't looked at the code, but I might see part of the issue, notice this part of the error message:

hot.json?&raw_json=

There's an ? followed by the &, which isn't valid. It's just supposed to be only one of those, I forgot which one though. The first parameter should start with a ? and then the subsequent parameters start with an &, never both.

So that might be resulting in an empty field right now:

payload

@sigaloid

While it is not very 'neat', most servers will deal with the empty parameter in the query string.

In testing it works fine on Reddit as well (you can test by browsing directly to Reddit with the full path from the error message - with or without the extra &, you get the same json returned, assuming your request is not blocked outright).

EDIT: That being said, if you wanted to detect traffic from a specific app and knew it had that 'quirk', you could probably identify those requests and then kill the sessions that sent them.

@e455a81e-d3ba-41a2-bc6d-7aafb1d9a5cd

I think it is quite interesting that restarting the container seems to help for a while. Is redlib generating some data on startup which is sent to the reddit api that could be used to block requests?

@pimlie
Copy link
Contributor

pimlie commented Oct 31, 2024

@e455a81e-d3ba-41a2-bc6d-7aafb1d9a5cd See #229 (comment) from the last issue, looking at the commit log cache poisoning could probably still be happening.

@e455a81e-d3ba-41a2-bc6d-7aafb1d9a5cd

Yes, that seems much more likely.

@dormieriancitizen
Copy link

dormieriancitizen commented Oct 31, 2024

Oddly, at least for me, even without restarting the issue is inconsistent.

Uptime Kuma is reporting 40% uptime, with seemingly random failures over the night

I can access my instance now but it seems like it's breaking at random (not from ratelimiting, from the parse failure)

EDIT: could still be ratelimiting, but it's not a 429

EDIT2: 6 minutes later, down again

@davegallant
Copy link

Gatus is telling me the endpoint was unhealthy for 1061 minutes with consistent ❌ [STATUS] (404) == 200 (parse errors).

After a reboot, it's been working fine for the past 60 minutes (with probes every 30s).

@sigaloid
Copy link
Member

Going to take a deeper look at this later today if I can. On my radar as high priority though.

If I had to guess, it's some similar thing to last time, aka some server side change that blocks the kind of request redlib makes. If we could try to replicate one of them using curl, and it works, then we know it's in the TLS stack like last time. If it doesn't, it's a more complicated fix.

@wuchyi
Copy link
Contributor

wuchyi commented Oct 31, 2024

Not sure if it's the same issue, but my redlib error looks a bit different:

Couldn't send request to Reddit: Rate limit - try refreshing soon

Edit: Scratch this, updated to the latest docker release and it's now showing the same JSON error as others.

@pimlie
Copy link
Contributor

pimlie commented Oct 31, 2024

Forcibly recreating the oauth token (which was the solution last time) here does not seem to work. Cache poisoning does not seem to be an issue while testing that as it also doesn't work when requesting a subreddit you did not visit yet.

As restarting redlib still works, I'm looking into the connection pooling now. A lot of people reported (this time and before) that after a restart it worked for a minute or so but then they started getting rate limited again. I'm quite sure this is not a minute but 90 seconds, which is the default connection pool idle timeout in hyper: https://docs.rs/hyper/0.14.31/hyper/client/struct.Builder.html#method.pool_idle_timeout

I can also reproduce that when I manually change the client config to:

client::Client::builder()
  .pool_idle_timeout(std::time::Duration::from_secs(10))
  .build(https)

With the above I'm not rate limited as long as I keep requesting pages but as soon as I don't request anything for 10s (or 5 or whatever) I'm rate limited. This seems counter-intuitive though, as my original thought was that re-using connections from the connection pool might be the issue but given the timeout it seems that creating new connections within the same pool is causing issues. Haven't look any further yet but it might very well be an upstream issue again.

Note, a possible workaround for now to not trigger this issue so often could be to specify .pool_idle_timeout(None) , but not sure what/if the disadvantages are of doing that.

@sigaloid
Copy link
Member

Even setting the pool max size to 1 and timeout to none (meaning keep the one connection open continuously) doesn't work, nor does setting the pool max size to zero which should start a new connection every time..

@sigaloid
Copy link
Member

sigaloid commented Oct 31, 2024

Narrowed down the issue and fixed it (in my testing so far). I just pushed efdf184, latest tag is released on quay.io/redlib/redlib. All, please test!

Fix info

I replaced every client call with generating an entirely new client. This slows down Redlib marginally (larger instances may notice it more) but now it works. This is an emergency patch to fix it temporarily, I won't close this issue just yet as I want to get to the root of the problem.

I specifically tried a global static like below, replaced every CLIENT call with client::Client::builder().build(CONNECTOR), and it still broke.

pub static CONNECTOR: Lazy<HttpsConnector<HttpConnector>> = Lazy::new(|| {
	let https = hyper_rustls::HttpsConnectorBuilder::new().with_native_roots().https_only().enable_http1().build();
	https
});

This leads me to believe the issue lies with the HttpsConnectorBuilder and not the client builder line (client::Client::builder().build(https)). Because we only reuse the native roots line, and it still fails. It seems like we need to rebuild the HttpsConnector every time.

@sigaloid
Copy link
Member

Ok, more discoveries: When I revert to before the fix to still use the CLIENT global, and I modify max_idle_per_host to zero, so that any connection is killed once it's done, it always fail to retrieve it.

pub static CLIENT: Lazy<Client<HttpsConnector<HttpConnector>>> = Lazy::new(|| {
	let https = hyper_rustls::HttpsConnectorBuilder::new().with_native_roots().https_only().enable_http1().build();
	client::Client::builder().pool_max_idle_per_host(0).build(https)
});

Given that restricting it to no kept-alive connections leads to permanent guaranteed failure, one would assume it's the issue of restarting a brand-new connection that causes it. Then why does the fix of creating a new pool every time work?

Perhaps it's because the new pool never allows two simultaneous open TCP connections...? Maybe if I set the timeout to zero...

	client::Client::builder().pool_idle_timeout(Duration::ZERO).build(https)

Works UNLESS one request is made while another is in-flight.

So conclusion is that if two connections within the same pool are in-flight, Reddit's CDN will block the second one. Why exactly the pool matters is unclear.

@matrox471
Copy link

I had the issue. i updated my image and i am running 9aea9c9 and it seems to work so far. the deployment took a solid 4-5 minutes between the
Running Redlib v0.35.1 on [::]:8080!
message and actually being able to reach it but other than that, it works.
The app seems ever so slightly less reactive but nothing world shattering.
Keep the good work mate ! cheers

@pimlie
Copy link
Contributor

pimlie commented Oct 31, 2024

So conclusion is that if two connections within the same pool are in-flight, Reddit's CDN will block the second one. Why exactly the pool matters is unclear.

Could that be a HTTP1 related issue? Maybe they strongly prefer / switched to HTTP/2 multiplexing when they know the client should be capable of that?

@joelkoen
Copy link

joelkoen commented Nov 1, 2024

Small reminder to consider donating if you appreciate sigaloid's work on this: https://liberapay.com/sigaloid

@sigaloid
Copy link
Member

sigaloid commented Nov 1, 2024

Yep, seemed to be HTTP/2 changes. Thanks to everyone who reported info and tested the fixes I pushed, glad this has been fixed in a more permanent way :)

@Cyrix126
Copy link

Cyrix126 commented Nov 1, 2024

Yep, seemed to be HTTP/2 changes. Thanks to everyone who reported info and tested the fixes I pushed, glad this has been fixed in a more permanent way :)

In https://github.com/redlib-org/redlib#binary it says to add this line to the nginx config
proxy_http_version 1.1;
Is it still needed ?

@lvxnull2
Copy link

lvxnull2 commented Nov 1, 2024

Yes, redlib still serves over http/1.1 but only connects to reddit with http2

@ggtylerr
Copy link

ggtylerr commented Nov 1, 2024

Hi there, I updated my instance to the latest build but it's still experiencing this problem: https://nyc1.lr.ggtyler.dev/

It's on the latest commit too, 2fd358f3eda1c25992c2a1c2d0e1bef2506627cb.

EDIT: Nevermind, for some reason it just started working after ~30 minutes of me starting the container.

@kumitterer
Copy link

Our instance is still having that issue, unfortunately. It is on the most recent commit, so the latest fix should be applied. Is there any way I can help debugging this? https://redlib.private.coffee/info

@sigaloid
Copy link
Member

sigaloid commented Nov 2, 2024

It could be an IP ban. Can you reproduce it on a different IP?

@tdtgit
Copy link

tdtgit commented Nov 2, 2024

Thanks! Fixed the issue, one user instance :)

@kumitterer
Copy link

It could be an IP ban. Can you reproduce it on a different IP?

I can. Tried routing through several tunnels, same result every time...

@kumitterer
Copy link

Hmm, after the umpteenth IP rotation and restart, it seems to be working now. 🤔

@ggtylerr
Copy link

ggtylerr commented Nov 8, 2024

Hi there, I updated my instance to the latest build but it's still experiencing this problem: https://nyc1.lr.ggtyler.dev/

It's on the latest commit too, 2fd358f3eda1c25992c2a1c2d0e1bef2506627cb.

EDIT: Nevermind, for some reason it just started working after ~30 minutes of me starting the container.

Update: Over the past week we're still getting this, not only on NYC-1 but also on CAL-1. It's pretty likely that the rate limit hasn't been resolved (especially like with @kumitterer here having to rotate IPs)

@sigaloid
Copy link
Member

@ggtylerr #318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.