[Ray debugger] Unable to use debugger on Ray Cluster on k8s #45541

yx367563 · 2024-05-24T09:37:38Z

What happened + What you expected to happen

I tried to use the debugger plug-in on VScode according to the guidance(https://www.anyscale.com/blog/ray-distributed-debugger), but when I click on a paused task to attach the VSCode debugger, I always get an error connect ECONNREFUSED $ip:port.
I tried to enable the plug-in locally and it worked normally.
I also tried to add the ray-debugger-external flag and tested that the Ray Cluster on k8s can enable the native debugger normally.
I don’t know how to use the debugger plug-in on VScode on the Ray Cluster on k8s. Can you provide relevant guidance or help?

Versions / Dependencies

Ray 2.23.0
Python 3.10.12

Reproduction script

Sample code in guidance

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

yx367563 · 2024-05-24T09:45:11Z

Or do I need to configure launch.json in vscode?

rasmus-unity · 2024-10-01T03:13:25Z

I think the problem is that Ray debugger uses a random port, so it's not possible to know ahead which port to open when running on Kubernetes

From https://github.com/ray-project/ray/blob/master/python/ray/util/debugpy.py:

def _ensure_debugger_port_open_thread_safe():
            (...)
            (host, port) = debugpy.listen(
                (ray._private.worker.global_worker.node_ip_address, 0)
            )

And from definition of listen() in https://github.com/microsoft/debugpy/blob/main/src/debugpy/public_api.

This may be different from address if port was 0 in the latter, in which case the adapter will pick some unused ephemeral port to listen on.

In our case we're running ephemeral Ray clusters using RayJob resource definition from KubeRay, so we could specify a single port. In case of static Ray clusters, could a port range be a solution?

anyscalesam · 2024-10-03T21:32:13Z

@brycehuang30 does the new distributed debugger have this capability? if we don't I say we build forward and add this as a feature request to that.

brycehuang30 · 2024-10-03T21:48:30Z

Distributed debugger currently cannot custom the debugging ports. I think we could solve this in two steps:

let debugger only use a fixed range of port numbers, e.g. 50000 - 51000. This allows users to set the port open in k8s
enable custom port range setting, so users could choose the port range

rasmus-unity · 2024-10-03T22:32:34Z

Thanks for looking into this. One more thing that needs to be considered when debugging a job running in Kubernetes, is the IP address

From Ray job log:

2024-10-03 11:07:18,021	INFO debugpy.py:66 -- Ray debugger is listening on 100.104.4.3:34983
2024-10-03 11:07:18,023	INFO debugpy.py:87 -- Waiting for debugger to attach...

That IP address 100.104.4.3 is internal to the Kubernetes cluster, so when trying to debug from VS Code, getting error
(In this case 127.0.0.1:8265 is being port-forwarded from Ray dashboard running in Kubernetes)

Possibly VS Code debugger plugin should connect to the external IP address of head node, and not the internal node IP address?

koenlek · 2024-11-30T03:41:42Z

I've run into an issue that seems very similar to this one. In fact, it might very well be the same issue.

I'm using Ray 2.30 and I get a connection refused error when I try to connect vs code to the paused task. I noticed that debugpy on the task actually crashes soon after debugpy.listen(...) is called, so by the time I'm trying to connect vs code, nothing is listening on the configured port anymore (the port as printed in the Ray debugger is listening on <ip>:<port> log msg)

I also tried ray 2.39: same issue
I tried patching ray to make debugpy run on a fixed port, and/or localhost/0.0.0.0 ip (combined with kubectl port forwarding): all to no avail. In all cases, the root issue seems that nothing is listening anymore on the port where debugpy is supposed to listen.
I tried running debugpy.listen on a k8s pod without ray, and in that case it works fine. Using lsof I can see that something is listening on the configured listen port.
The underlying debugpy crash is hard to detect, apart from that nothing is listening on the port... However, if you enable extra logging you can see it crash (BrokenPipeError) in the logs though. I reported this issue in debugpy here (with details on how to find the crash msg in debugpy.pydevd.NNNN.log): debugpy listen silently crashing microsoft/debugpy#1749

…#49116)   ## Why are these changes needed? This addresses #45541 and #49014 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: angelinalg <[email protected]>

…ray-project#49116)   ## Why are these changes needed? This addresses ray-project#45541 and ray-project#49014 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: angelinalg <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>

rogerfydp · 2025-01-22T15:22:53Z

Distributed debugger currently cannot custom the debugging ports. I think we could solve this in two steps:

let debugger only use a fixed range of port numbers, e.g. 50000 - 51000. This allows users to set the port open in k8s

enable custom port range setting, so users could choose the port range

We are facing the same issue with our local Ray Cluster but in our case behind docker-compose for local development/testing.

We were wondering if the suggested solution like an optional parameter to specify debugpy ports is still an option or if there is any other recommendation to overcome the issue.

rasmus-unity · 2025-01-22T15:48:05Z

We ended up deploying https://docs.linuxserver.io/images/docker-code-server/ inside the Kubernetes cluster, which can then access the necessary ports

Moonquakes · 2025-01-23T01:05:45Z

@rasmus-unity Thank you for sharing. Can you explain the specific operation process?

At the same time, I noticed that there is a relevant pr (#49116). Can I assume that this requirement can be met by referring to this document? cc @brycehuang30

rogerfydp · 2025-01-24T15:13:05Z

@rasmus-unity and @Moonquakes, thank you for your insights!

To test this, we created a Dockerfile based on Ray images and installed the SSH server as mentioned in #49116, along with other necessary components. We were seeking a solution for agile local development and debugging, so we also ended up having to mount the source code being developed as volumes on the Ray head node and installed various tools such as Devbox that we require for developing. This setup allowed us to develop directly on the Ray head and utilize the Ray Distributed Debugger extension, but we believe this is so much complexity added aside from installing unnecessary stuff on the ray-head that could potentially be overcome.

While this approach was useful and for the moment makes the trick for us, we still believe that having an out-of-the-box solution without the need to install SSH servers and other possible dependencies would be extremely valuable having already the amazing provided Ray Distributed Debugger extension. Implementing a way to configure a range of ports for debugpy to listen on, as previously suggested by @brycehuang30, would greatly enhance the development experience in our opinion.

Moonquakes · 2025-02-05T08:59:38Z

Hi @rogerfydp , Could you explain your operation steps and dockerfile in more detail? I installed ssh according to the instructions in #49116, and opened port 22. It seems that there will be other problems. Kuberay will open some ports by default when the port is not filled in, but it will not be added if 22 is added manually (https://github.com/ray-project/kuberay/blob/v1.2.2/ray-operator/controllers/ray/common/service.go#L409-L417).

yx367563 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 24, 2024

anyscalesam added enhancement Request for new feature and/or capability and removed bug Something that is supposed to be working; but isn't labels May 24, 2024

anyscalesam added the dashboard Issues specific to the Ray Dashboard label Jun 3, 2024

brycehuang30 added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 7, 2024

brycehuang30 self-assigned this Jun 7, 2024

brycehuang30 mentioned this issue Aug 23, 2024

doc: update ray debugger deprecation notice #47149

Merged

koenlek mentioned this issue Dec 2, 2024

[Ray Distributed Debugger] Unable to use debugger on Ray Cluster on k8s: deubgpy.listen(...) crashes silently briefly after being called #49014

Open

pcmoritz mentioned this issue Dec 6, 2024

[KubeRay] Add instructions to run distributed Ray Debugger in KubeRay #49116

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray debugger] Unable to use debugger on Ray Cluster on k8s #45541

[Ray debugger] Unable to use debugger on Ray Cluster on k8s #45541

yx367563 commented May 24, 2024

yx367563 commented May 24, 2024

rasmus-unity commented Oct 1, 2024

anyscalesam commented Oct 3, 2024

brycehuang30 commented Oct 3, 2024

rasmus-unity commented Oct 3, 2024

koenlek commented Nov 30, 2024

rogerfydp commented Jan 22, 2025

rasmus-unity commented Jan 22, 2025

Moonquakes commented Jan 23, 2025 •

edited

Loading

rogerfydp commented Jan 24, 2025

Moonquakes commented Feb 5, 2025

[Ray debugger] Unable to use debugger on Ray Cluster on k8s #45541

[Ray debugger] Unable to use debugger on Ray Cluster on k8s #45541

Comments

yx367563 commented May 24, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

yx367563 commented May 24, 2024

rasmus-unity commented Oct 1, 2024

anyscalesam commented Oct 3, 2024

brycehuang30 commented Oct 3, 2024

rasmus-unity commented Oct 3, 2024

koenlek commented Nov 30, 2024

rogerfydp commented Jan 22, 2025

rasmus-unity commented Jan 22, 2025

Moonquakes commented Jan 23, 2025 • edited Loading

rogerfydp commented Jan 24, 2025

Moonquakes commented Feb 5, 2025

Moonquakes commented Jan 23, 2025 •

edited

Loading