Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent error after calling disconnect: "libev: I/O watcher with invalid fd found in epoll_ctl" #378

Open
todds02 opened this issue Apr 11, 2023 · 1 comment

Comments

@todds02
Copy link

todds02 commented Apr 11, 2023

For general questions please use the mail group.

Describe the bug

Using parallel-ssh single client under gevent and python3. Intermittently, after calling disconnect() on the ssh session, the python interpreter crashes with the error message

"python3: ev_epoll.c:153: epoll_modify: Assertion `("libev: I/O watcher with invalid fd found in epoll_ctl", errno != EBADF && errno != ELOOP && errno != EINVAL)' failed."

To Reproduce

I can reproduce this using a slightly modified version of the example script (https://parallel-ssh.readthedocs.io/en/latest/quickstart.html#single-host-client)

from pssh.clients import SSHClient

attempts = 0
while True:
    attempts += 1
    host = 'server.example.com_'
    cmd = 'ls -al /'
    print('Connection attempt {}'.format(attempts))
    client = SSHClient(host, user=USERNAME, password=PASSWORD)

    host_out = client.run_command(cmd)
    for line in host_out.stdout:
        print(line)
    print('Disconnecting')
    client.disconnect()
    gevent.sleep(0.1)

Within <50 attempts, the error message is seen. What's also odd is the output is only printed every other time through the loop.

Disconnecting
Connecting attempt 26
Disconnecting
Connecting attempt 27
total 106
dr-xr-xr-x.  24 root root  4096 Oct  5  2022 .
dr-xr-xr-x.  24 root root  4096 Oct  5  2022 ..
<snipped for brevity>
drwxr-xr-x.  17 root root  4096 Jan 24  2018 var
drwxr-xr-x.   6 root root  4096 Apr  1  2022 work
Disconnecting
python3: ev_epoll.c:153: epoll_modify: Assertion `("libev: I/O watcher with invalid fd found in epoll_ctl", errno != EBADF && errno != ELOOP && errno != EINVAL)' failed.

In some other testing, we found that if we don't access output, then the epoll error goes away, but sometimes output.exit_code is None even after a call to wait_finished(output).

In the full script (which I cannot post here), adding a sleep(1) after the disconnect seems to clear up the issue - it doesn't work in the above example, though.

Most of the examples I could find do not show explicit disconnect() calls, but these are necessary to ensure proper cleanup for long-running processes. Is there a safe way to disconnect that I'm missing, or is something not being cleaned up properly internally?

Expected behavior

The session to disconnect cleanly without crashing

Actual behaviour

A crash is intermittently seen

Screenshots

Additional information

python3.6
python-gevent 22.10.2
python-greenlet 2.0.2
parallel-ssh 2.12.0
python3-ssh2-python 1.0.0.0
libssh2 1.9.0-5

@pkittenis
Copy link
Member

pkittenis commented Jan 13, 2025

Generally, the clients call disconnect when they go out of scope -

def __del__(self):
try:
self.disconnect()
except Exception:
pass

I'd be surprised if anything is left hanging without an explicit .disconnect - there are also some tests for this. That said, it shouldn't crash, will try to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants