Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook & lab kernel restarting without obvious reason when working with Jina #5863

Closed
hanxiao opened this issue Nov 12, 2020 · 4 comments

Comments

@hanxiao
Copy link

hanxiao commented Nov 12, 2020

Hi Jupyter team, I'm the maintainer of the Jina project. I recently found out our project does not play well with Jupyter notebook (6.1.5), not only that but also Jupyter lab (2.2.6), your official online lab, inside Docker (jupyter/minimal-notebook), macOS & Linux. I think it is safe to say Jina and Jupyter are not compatible and as the maintainer I really want to figure out why.

Reproduce the Kernel Restart

Here is how to reproduce the result in Notebook/Lab (with Python 3.7 & 3.8 Kernel), zipped notebook is attached. Interestingly, the below code snippet works completely fine in Python, IPython, it just fails under Notebook/Lab of any kind:


!pip install jina  # or how ever you want to install it
from jina.flow import Flow
f = Flow().add()

with f:
    pass
       pod0@28935[I]:post initiating, this may take some time...
       pod0@28935[I]:post initiating, this may take some time takes 0 seconds (0.01s)
       pod0@28935[S]:successfully built BaseExecutor from a yaml config
       pod0@28935[I]:setting up sockets...
       pod0@28935[I]:input tcp://0.0.0.0:62045 (0) 	 output tcp://0.0.0.0:62046 (2)	 control over tcp://0.0.0.0:62047 (8)
       pod0@28935[S]:ready and listening
       JINA@28923[I]:using <class 'asyncio.unix_events._UnixSelectorEventLoop'> as event loop
    gateway@28923[S]:gateway is listening at: 0.0.0.0:62052
       Flow@28923[I]:2 Pods (i.e. 2 Peas) are running in this Flow
       Flow@28923[S]:flow is now ready for use, current build_level is 1
    gateway@28923[S]:terminated
       pod0@28935[I]:recv ControlRequest from ctl▸pod0▸⚐
       pod0@28935[I]:RequestLoopEnd() causes the breaking from the event loop
       pod0@28935[I]:no update since 2020-11-12 11:03:26, will not save. If you really want to save it, call "touch()" before "save()" to force saving
       pod0@28935[I]:executor says there is nothing to save
       pod0@28935[I]:#sent: 0 #recv: 1 sent_size: 0 Bytes recv_size: 137 Bytes
       pod0@28935[I]:#sent: 1 #recv: 1 sent_size: 227 Bytes recv_size: 137 Bytes
       pod0@28935[S]:terminated
       Flow@28923[S]:flow is closed and all resources should be released already, current build level is 0

💥 💥 💥 💥 Kernel gets restarted at now for no obvious reason after the last cell.
With --debug it just says:

...
[D 11:03:26.200 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: stream
[D 11:03:26.204 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: status (idle)
[D 11:03:26.210 LabApp] 304 GET /static/base/images/favicon.ico (::1) 1.39ms
[I 11:03:26.832 LabApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel b8186ee1-b32d-46c3-807c-c0fadd28fbb8 restarted
[D 11:03:26.833 LabApp] Starting kernel: ['/Users/hanxiao/.pyenv/versions/3.7.5/bin/python3.7', '-m', 'ipykernel_launcher', '-f', '/Users/hanxiao/Library/Jupyter/runtime/kernel-b8186ee1-b32d-46c3-807c-c0fadd28fbb8.json']
[D 11:03:26.838 LabApp] Connecting to: tcp://127.0.0.1:61862
[D 11:03:27.519 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: status (starting)
[D 11:03:27.612 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: status (busy)
[D 11:03:27.613 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: status (idle)
[D 11:03:27.617 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: status (busy)
[D 11:03:27.618 LabApp] activity on b8186ee1-b32d-46c3-807c-c0fadd28fbb8: execute_input
...

Now the line below will complain about f is undefined because the kernel is restarted

print(f)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-1-fc0364975534> in <module>
----> 1 print(f)


NameError: name 'f' is not defined

Reasons?

Unfortunately, from the Jupyter debug log and Chrome dev console, I can't find much useful information on why the kernel is restarted? Who triggers it?

Resource-wise, the above code snippet is pretty standard, no memory-eating or strong computational demanding.

Conflict Tech Stack?

Jina does not have many dependencies, they are

numpy
pyzmq>=17.1.0
protobuf>=3.13.0
grpcio
ruamel.yaml>=0.15.89
tornado>=5.1.0
uvloop (recommend but not a must)

We heavily rely on ZeroMQ which as far as I know is also the communication stack behind Jupyter. Could this be the problem? We also heavily rely on multiprocessing and asycio.

uvloop looks suspicious at first, but removing uvloop did not help.

Other Attempts

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
ulimit -n 4096 && jupyter lab

No luck here.

Shed a Light?

I know this may beyond Jupyter itself. But as the maintainer of Jina I really want to figure out the reason behind the crash. With my limit understanding on how Jupyter works, that's the best I can try for now. Maybe some insight and guess from you? It would be really awesome! ❤️

If you need any elaboration on the problem, I'm here! 👋

jina-jupyter-bug-report.ipynb.zip

@bollwyvl
Copy link
Contributor

bollwyvl commented Nov 12, 2020 via email

@hanxiao
Copy link
Author

hanxiao commented Nov 12, 2020

i see! Thank you @bollwyvl for pointing me to the right direction. Now I look carefully at our log again, it does use builtin asyncio instead of uvloop for the eventloop when running inside Jupyter, whereas it should use uvloop by default:

   JINA@28923[I]:using <class 'asyncio.unix_events._UnixSelectorEventLoop'> as event loop

That should be the indicator of this problem.

This link you sent looks pretty relevant: https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
Let me check what we can do here.

@hanxiao
Copy link
Author

hanxiao commented Nov 14, 2020

after some dig in, i think this is most related to ipykernel. similar issue reported here ipython/ipykernel#548

@hanxiao
Copy link
Author

hanxiao commented Dec 14, 2020

We have refactored the asyncio ops (jina-ai/serve#1450) on our end, this issue is solved. The main idea is, jupyter already has a running eventloop in the beginning. Do not try to terminate/stop that event loop. Use asyncio.create_task if possible.

@hanxiao hanxiao closed this as completed Dec 14, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants