Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

orena1 · 2024-12-23T19:04:57Z

Is your feature request related to a problem? Please describe.

I encountered frequent errors like the following when running Neptune with NeMo:

Experiencing connection interruptions. Will try to reestablish communication with Neptune. Internal exception was: HTTPTooManyRequests

Once this error occurred, the job would hang and not progress, eventually resulting in a cu10 error message. The only solution I found was to disable Neptune entirely.

After contacting support, I learned this issue is caused by reaching the default workspace usage limit. Here is their response:

Hi Oren,
Thanks for reaching out. Yes, it looks like you're reaching the default workspace usage limit

It would be much better if the error message directly indicated the actual problem, such as:
Experiencing connection interruptions with Neptune. It appears you are reaching the default workspace usage limit. Please review your workspace limits or contact support for assistance.

This would have saved me time (it took 4 hours to diagnose the issue) and prevented frustration. If I had not reached out to support, our company might have abandoned the idea of using Neptune entirely.

Additionally, it would be beneficial if the job did not fail or freeze in cases where usage limits are exceeded. A graceful handling of such situations would improve the user experience.

Additional context:

Where can I check whether I have indeed reached the usage limit? The dashboard currently only shows storage limits, not connection limits. Clarifying this in the UI or documentation would also be helpful.

Thank you!

The text was updated successfully, but these errors were encountered:

SiddhantSadangi · 2024-12-24T12:06:50Z

Hey @orena1 👋

Thank you for the detailed feature request. We do have a page in the docs that deals with this error: https://docs.neptune.ai/help/reducing_requests/, but I'll add more details, as you've mentioned, with a link to this page in the error message itself 📝

Additionally, it would be beneficial if the job did not fail or freeze in cases where usage limits are exceeded. A graceful handling of such situations would improve the user experience.

The job doesn't actually freeze. Neptune's Lightning integration (on which NeMo's Neptune integration is built) calls a wait() internally to ensure all logging calls have reached the server before proceeding with execution. When already rate-limited, this wait can make it seem as if the training has frozen, when it hasn't. If you check the Neptune WebApp, you should be able to see monitoring metrics being updated (unless there's a large file, like model checkpoint, being uploaded).

Where can I check whether I have indeed reached the usage limit? The dashboard currently only shows storage limits, not connection limits. Clarifying this in the UI or documentation would also be helpful.

Currently, this information is only available on the back end. I'll pass on this feedback to the product team if we can include this on the dashboard somehow 📝

SiddhantSadangi · 2024-12-24T13:44:28Z

@orena1 - We have a PR to add a more descriptive error message, complete with links to the docs and who to contact for support.

Can you install this version of neptune from the source to check if this works for you?

pip install git+https://github.com/neptune-ai/neptune-client-scale.git@ss/1.x/HTTPTooManyRequests

orena1 · 2024-12-24T22:49:39Z

Thanks @SiddhantSadangi that is much more informative! I can not really test it as these error stopped for now.

GeorgePearse · 2025-01-10T15:53:30Z

This is definitely freezing my training, which like the other poster, makes me tempted to ditch neptune.

GeorgePearse · 2025-01-10T15:55:06Z

You can't even see that you're hitting into any useage limits via the UI?

GeorgePearse · 2025-01-10T17:19:21Z

This was definitely blocking my training, paying unblocked it (but nothing was making it obvious that this was the problem).

Just:

[neptune] [warning] Experiencing connection interruptions. Will try to reestablish communication with Neptune. Internal exception was: HTTPTooManyRequests

SiddhantSadangi · 2025-01-13T11:42:58Z

Hey @GeorgePearse 👋

As mentioned in a previous comment, the training doesn't really freeze, but depending on the sync backlog, can be throttled down enough to appear that it has frozen, especially if large files are being uploaded to Neptune 👇

The job doesn't actually freeze. Neptune's Lightning integration (on which NeMo's Neptune integration is built) calls a wait() internally to ensure all logging calls have reached the server before proceeding with execution. When already rate-limited, this wait can make it seem as if the training has frozen, when it hasn't. If you check the Neptune WebApp, you should be able to see monitoring metrics being updated (unless there's a large file, like model checkpoint, being uploaded).

The lack of context with the HTTPTooManyRequests error and no information about usage limits in the UI are a very valid points 💯
We have a version of neptune pending merge and release that adds more context to the HTTPTooManyRequests error, and UX changes that shall add this and other usage limits in the UI are in the backlog.

These changes are, however, currently low priority as we are working on an entirely new version of Neptune - both the API and UI - from the ground up where these issues don't exist. It is currently in private beta, but you can sign-up for early access here.

orena1 changed the title ~~Feature Request: Better error message when usage limit arrived.~~ Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) Dec 23, 2024

SiddhantSadangi self-assigned this Dec 24, 2024

SiddhantSadangi added feature request api 1.x labels Dec 24, 2024

SiddhantSadangi added this to the 1.14 milestone Dec 24, 2024

SiddhantSadangi linked a pull request Dec 24, 2024 that will close this issue

Better error message for HTTPTooManyRequests #1895

Open

2 tasks

SiddhantSadangi removed this from the 1.14 milestone Dec 24, 2024

SiddhantSadangi assigned amberRrucker Dec 25, 2024

SiddhantSadangi linked a pull request Dec 25, 2024 that will close this issue

Better error message for HTTPTooManyRequests #1895

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

orena1 commented Dec 23, 2024

SiddhantSadangi commented Dec 24, 2024

SiddhantSadangi commented Dec 24, 2024

orena1 commented Dec 24, 2024

GeorgePearse commented Jan 10, 2025

GeorgePearse commented Jan 10, 2025

GeorgePearse commented Jan 10, 2025 •

edited

Loading

SiddhantSadangi commented Jan 13, 2025

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

Comments

orena1 commented Dec 23, 2024

Is your feature request related to a problem? Please describe.

Additional context:

SiddhantSadangi commented Dec 24, 2024

SiddhantSadangi commented Dec 24, 2024

orena1 commented Dec 24, 2024

GeorgePearse commented Jan 10, 2025

GeorgePearse commented Jan 10, 2025

GeorgePearse commented Jan 10, 2025 • edited Loading

SiddhantSadangi commented Jan 13, 2025

GeorgePearse commented Jan 10, 2025 •

edited

Loading