Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 4 second timeout for each join attempt in region list. #605

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

brianbraunstein
Copy link
Contributor

No description provided.

@CLAassistant
Copy link

CLAassistant commented Jan 24, 2025

CLA assistant check
All committers have signed the CLA.

@brianbraunstein
Copy link
Contributor Author

brianbraunstein commented Jan 24, 2025

How is this library tested before being released?

Copy link
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

// succeeded.
callCtx, cancelCallCtx := context.WithTimeout(ctx, 4 * time.Second)
joinRes, err = r.engine.JoinContext(callCtx, bestURL, token, params)
cancelCallCtx()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessing with exponential backoff in room.go, it could still time out. I think it does a bunch of tries. Should we limit that also to be within like 15 seconds total maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a separate issue and should be fixed by allowing stringing a context.Context through the upstream parts of the call stack. I didn't want to combine fixing these 2 issues though, I'm just trying to target the outage action item here.

@boks1971
Copy link
Contributor

How is this library tested before being released?

I usually test it with lk (a. k. a livekit-cli), but that is not great for testing this specific path. My changes have been more functional and I could use lk to test it out. Maybe, @matkam has notes on how he tested this when he first added this path.

@brianbraunstein
Copy link
Contributor Author

How is this library tested before being released?

I usually test it with lk (a. k. a livekit-cli), but that is not great for testing this specific path. My changes have been more functional and I could use lk to test it out. Maybe, @matkam has notes on how he tested this when he first added this path.

That sounds reasonable because I can blackhole clusters by modifying /etc/hosts on my local machine.

Are there instructions or examples anywhere I should follow for doing what you do with lk ?

@boks1971
Copy link
Contributor

How is this library tested before being released?

I usually test it with lk (a. k. a livekit-cli), but that is not great for testing this specific path. My changes have been more functional and I could use lk to test it out. Maybe, @matkam has notes on how he tested this when he first added this path.

That sounds reasonable because I can blackhole clusters by modifying /etc/hosts on my local machine.

Are there instructions or examples anywhere I should follow for doing what you do with lk ?

The instructions in the repo is good - https://github.com/livekit/livekit-cli.

And the help menu of the application also.

@boks1971
Copy link
Contributor

How is this library tested before being released?

I usually test it with lk (a. k. a livekit-cli), but that is not great for testing this specific path. My changes have been more functional and I could use lk to test it out. Maybe, @matkam has notes on how he tested this when he first added this path.

That sounds reasonable because I can blackhole clusters by modifying /etc/hosts on my local machine.
Are there instructions or examples anywhere I should follow for doing what you do with lk ?

The instructions in the repo is good - https://github.com/livekit/livekit-cli.

And the help menu of the application also.

Once I have lk set up like ☝️ , I add meet.staging project, start a room via meet webapp and join a bot user from livekit-cli with a canned video (using the --publish-demo flag).

@brianbraunstein
Copy link
Contributor Author

ok, manually tested by blackholing in /etc/hosts and found a surprising other place that also needed the context. after a failed connection, an http request is sent to try to gather info on why it failed. i just passed it the same ctx and it immediately fails because the deadline is already exceeded, which also more-or-less gives the correct error message (context deadline exceeded)

@boks1971
Copy link
Contributor

ok, manually tested by blackholing in /etc/hosts and found a surprising other place that also needed the context. after a failed connection, an http request is sent to try to gather info on why it failed. i just passed it the same ctx and it immediately fails because the deadline is already exceeded, which also more-or-less gives the correct error message (context deadline exceeded)

Oh shoot, forgot about that. My bad. Good catch

@brianbraunstein brianbraunstein merged commit 426d082 into main Jan 24, 2025
2 checks passed
@brianbraunstein brianbraunstein deleted the bbraunstein-a10-timeout_for_join_attempts branch January 24, 2025 14:02
return e.JoinContext(context.TODO(), url, token, params)
}

func (e *RTCEngine) JoinContext(ctx context.Context, url string, token string, params *SignalClientConnectParams) (*livekit.JoinResponse, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -147,7 +158,7 @@ func (c *SignalClient) connect(urlPrefix string, token string, params SignalClie
}

header := newHeaderWithToken(token)
conn, hresp, err := websocket.DefaultDialer.Dial(u.String(), header)
conn, hresp, err := websocket.DefaultDialer.DialContext(ctx, u.String(), header)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the err is due to context cancelled, then we should skip checking /validate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah just read the above comment stream. feel free to ignore :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants