Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Detach socket in create_server, create_connection, and other APIs #449

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

1st1
Copy link
Member

@1st1 1st1 commented Oct 19, 2016

Target

Python 3.6b3

Background

In Python 3.6, socket.close() method will propagate any OSError from the underlying syscall (see issue #26685 for details.)

What it means for alternative implementations of asyncio event loop, such as uvloop is that programs that run on Python 3.5 may become broken on Python 3.6. Consider the following example:

sock = socket.socket()
sock.bind(('127.0.0.1', 0))
srv = loop.run_until_complete(
    loop.create_server(protocol_factory, sock=sock))
try:
    loop.run_forever()
finally:
    srv.close()
    loop.run_until_complete(srv.wait_closed())

    sock.close()  # <- This line will raise OSError(EBADF) 
                  #    in Python 3.6 + uvloop

    loop.close()

The marked line (sock.close()) will raise OSError in Python 3.6 with uvloop. Why? Because uvloop passes the FD of the socket to libuv, and when the server is closed it will close the FD.

In Python 3.6, sock.close() will try to close the same FD, will receive an EBADF from the OS, and propagate the OSError.

This isn't strictly related to uvloop—any efficient low-level implementation of the event loop that doesn't use Python's high level socket and selector modules will have the same problem.

Solution

I believe the actual solution would be to modify asyncio APIs that accept sockets to detach them. When you pass a socket object to loop.create_server, you are essentially saying: "asyncio, here's my socket, please start serving on it and manage it for me".

This patch does a simple thing: sockets passed to create_server, create_connection, create_datagram_endpoint and connect_accepted_socket are detached with sock.detach(). A new socket is constructed for the same FD and it's then used internally in asyncio. Creating a new socket for the FD is fast, we only create a new Python socket object wrapping an FD (as opposed to using socket.dup() which would make a sys call).

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall change, but...

@@ -87,6 +87,12 @@ def _set_reuseport(sock):
'SO_REUSEPORT defined but not implemented.')


def _copy_and_detach_socket(sock):
new_sock = socket.socket(sock.family, sock.type, sock.proto, sock.fileno())
sock.detach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use detach() to get the FD.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -827,6 +835,7 @@ def create_datagram_endpoint(self, protocol_factory,
raise ValueError(
'socket modifier keyword arguments can not be used '
'when sock is specified. ({})'.format(problems))
sock = _copy_and_detach_socket(sock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who closes the new cloned socket in case of failure at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this line after the setblocking call. In general, if failure occurs the socket will be closed by asyncio transport/server/etc. As all other sockets created in those methods.


f = self.loop.create_server(TestMyProto, sock=sock_ob)
server = self.loop.run_until_complete(f)
sock = server.sockets[0]
self.assertIs(sock, sock_ob)
self.assertEqual(sock.fileno(), sock_ob_fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check that sock_ob is detach: sock_ob.fileno() must be equal to -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@vstinner
Copy link
Member

In Python 3.6, sock.close() will try to close the same FD, will receive an EBADF from the OS, and propagate the OSError.

Hum, the error is more how is it possible that the code worked on Python 3.5 :-) Calling close(fd) on a fd which is already closed is likely to close a random newly creatd socket/file. It's error prone. It's just that Python 3.6 makes the bug obvious, no?

@vstinner
Copy link
Member

Even after reading your example many times, I'm still confused. I don't understand who "owns" the socket.

I would expect that since you pass a socket to asyncio, the event loop owns the socket. The event loop is responsible to close the resource when it is done. For example for a server, Server.close() must be called. For transports, not calling close() will complain with a ResourceWarning.

So I'm not sure that your example is correct. I don't expect a call to sock.close() in the caller, only server.close().

What do you think?

@1st1
Copy link
Member Author

1st1 commented Oct 19, 2016

In Python 3.6, sock.close() will try to close the same FD, will receive an EBADF from the OS, and propagate the OSError.

Hum, the error is more how is it possible that the code worked on Python 3.5 :-) Calling close(fd) on a fd which is already closed is likely to close a random newly creatd socket/file. It's error prone. It's just that Python 3.6 makes the bug obvious, no?

In Python 3.5:

  • With vanilla asyncio: everything was working fine and it wasn't error prone, since loop used the same socket object, and when it's closed—it's closed.
  • With uvloop: currently I use os.dup() for sockets, so the code wasn't error prone. But I very much want to remove the dup calls, and it appears, that if we don't merge this PR I wouldn't be able to.

@vstinner
Copy link
Member

Now for uvloop, I'm sorry, I don't know understand how it is implemented exactly, so I don't know how uvloop should understand it.

I mean: maybe the default behaviour when using asyncio should not be changed, but maybe we can add an helper (method that can be overriden?) for other implementations of event loops?

@1st1
Copy link
Member Author

1st1 commented Oct 19, 2016

So I'm not sure that your example is correct. I don't expect a call to sock.close() in the caller, only server.close().

What do you think?

Correct, ideally users shouldn't close sockets manually after they pass them to APIs like create_server. But sometimes they do. So I think it's better to set the record straight: when you pass a socket to the loop, the socket object will be detached and from that point you shouldn't try to manage or use it. This is potentially a small backwards compatibility breaker, BTW.

Now for uvloop, I'm sorry, I don't know understand how it is implemented exactly, so I don't know how uvloop should understand it.

The fundamental problem: in vanilla asyncio, an FD is wrapped in a Python socket object. In uvloop, I don't use Python socket objects, I just pass their duped FDs directly to libuv.

I can't use the FD of the Python socket in uvloop directly without either: detaching the Python socket object or using os.dup(). Because if I don't detach or dup, we'll end up in a situation where the same FD is managed by both Python socket object and libuv.

I mean: maybe the default behaviour when using asyncio should not be changed, but maybe we can add an helper (method that can be overriden?) for other implementations of event loops?

The idea is that uvloop should be a drop-in replacement of asyncio event loop (and that's what the promise of pluggable event loops in asyncio is about). If we don't detach sockets, I'll be forced to use dup on sockets forever, which I don't want to (file descriptors is a limited resource).

Up until yesterday I had plans to remove dup calls from uvloop. And now in Python 3.6 I realized that I can't really do that, unless we fix asyncio to detach the sockets.

@vstinner
Copy link
Member

"Correct, ideally users shouldn't close sockets manually after they pass them to APIs like create_server"

Hum, I would use a strong wrong: users must not close a socket after passing it to asyncio. It doesn't make sense, and an error is the expected behaviour IMO.

@1st1
Copy link
Member Author

1st1 commented Oct 19, 2016

"Correct, ideally users shouldn't close sockets manually after they pass them to APIs like create_server"

Hum, I would use a strong wrong: users must not close a socket after passing it to asyncio. It doesn't make sense, and an error is the expected behaviour IMO.

The error won't happen with vanilla asyncio. It will only happen in asyncio+uvloop (or any other event loop implemented in C). This is the problem.

@vadmium
Copy link
Member

vadmium commented Oct 20, 2016

If asyncio users shouldn’t close sockets, are they still allowed to do other operations on them, e.g. call methods like getpeername(), recv()?

I haven’t followed this closely, maybe another option is to leave regular asyncio implemented in Python more or less as it is, but call detach() in uvloop to prevent users doing bad things. Either detach it

  • when the user passes a Python socket object to uvloop, or
  • assuming that libuv can let uvloop know when it closes the socket, detach the Python object at that time

BTW, Yury do you have a link to the libuv documentation or something that explains how/why it closes the socket? It might help me understand your original problem.

@1st1
Copy link
Member Author

1st1 commented Oct 20, 2016

BTW, Yury do you have a link to the libuv documentation or something that explains how/why it closes the socket? It might help me understand your original problem.

I think I misguided you, Victor and even myself when I talked about libuv. Please don't focus on libuv. I'll try to explain the issue again.

Terminology in this comment:

  • FD: file descriptor
  • PSO: Python Socket Object (wraps an FD)
  • socket: low level network socket (FD)

Problem:

The real issue is that FD is a shared resource:

  • Vanilla asyncio: If you pass a PSO into loop.create_connection, the loop will use that PSO.
  • asyncio+uvloop: If you pass a PSO into loop.create_connection, the loop will use the FD of that PSO!

Now, in asyncio, sockets are wrapped into transports. And transports are closed at some point. When transports are closed, the socket (FD) gets closed too.

In vanilla asyncio, since the same PSO is used, the socket gets closed, and everything works fine.

In asyncio+uvloop, since the PSO was unwrapped and we really use the underlying FD, that FD will be closed. So when the original PSO gets closed (by the user, or by the GC), it can be already closed by asyncio (+uvloop). In 3.6 we'll have an OSError.

This is a problem not only for uvloop, but for any low-level reimplementation of asyncio event loop that doesn't use PSOs.

Solutions:

  • Fix asyncio: always detach the passed PSOs to loop.create_server, loop.create_connection etc. Essentially, asyncio will: extract the FD out of the passed PSO; rewrap it into a new PSO; detach the passed PSO.
  • Fix uvloop: extract the FD from the passed PSOs and dup them.

Fix asyncio solution is more preferable (at least from my standpoint) because:

  1. When we pass a PSO into asyncio API, we explicitly tell asyncio: manage this PSO now. The assumption should be that user won't be using/interacting with that PSO at all, because they can screw up the underlying FD. IMO, detaching PSOs in asyncio makes this all obvious.
  2. Using dup in asyncio+uvloop is suboptimal, because FDs are a limited system resource. Under some circumstances, a program using asyncio+uvloop may run out of FDs sooner than the same program using vanilla asyncio.

@1st1
Copy link
Member Author

1st1 commented Nov 9, 2016

@gvanrossum What do you think about this one?

(the issue is best and fully explained in the previous comment)

@gvanrossum
Copy link
Member

(Notice: I'm all out of round tuits today.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants