-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for code execution on the Jupyter Server #307
Conversation
30eec10
to
536a7b8
Compare
// jupyverse case - it is undefined in jupyter-server | ||
const fileId = notebook.sharedModel.getState('file_id') ?? ''; | ||
let documentId = `json:notebook:${fileId}`; | ||
if (!fileId) { | ||
if ( | ||
// FIXME sessionContext.path seems to be local - should we by-pass this test? | ||
['', this._drive.name].includes( | ||
this._contents.driveName( | ||
sessionContext.session?.path ?? sessionContext.path | ||
) | ||
) | ||
) { | ||
const localPath = this._contents.localPath( | ||
sessionContext.session?.path ?? sessionContext.path | ||
); | ||
documentId = | ||
this._drive.getRoomName({ | ||
localPath: localPath, | ||
format: 'json', | ||
type: 'notebook' | ||
}) ?? ''; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like elsewhere in the codebase the fileId
is retrieved by requesting a session using requestDocSession
:
jupyter-collaboration/packages/docprovider/src/yprovider.ts
Lines 95 to 103 in adb51b0
const session = await requestDocSession( | |
this._format, | |
this._contentType, | |
this._path | |
); | |
this._yWebsocketProvider = new YWebsocketProvider( | |
this._serverUrl, | |
`${session.format}:${session.type}:${session.fileId}`, |
which makes a request to api/collaboration/session
handled here:
jupyter-collaboration/projects/jupyter-server-ydoc/jupyter_server_ydoc/handlers.py
Lines 418 to 461 in adb51b0
@web.authenticated | |
@authorized | |
async def put(self, path): | |
""" | |
Creates a new session for a given document or returns an existing one. | |
""" | |
body = json.loads(self.request.body) | |
format = body["format"] | |
content_type = body["type"] | |
file_id_manager = self.settings["file_id_manager"] | |
idx = file_id_manager.get_id(path) | |
if idx is not None: | |
# index already exists | |
self.log.info("Request for Y document '%s' with room ID: %s", path, idx) | |
data = json.dumps( | |
{ | |
"format": format, | |
"type": content_type, | |
"fileId": idx, | |
"sessionId": SERVER_SESSION, | |
} | |
) | |
self.set_status(200) | |
return self.finish(data) | |
# try indexing | |
idx = file_id_manager.index(path) | |
if idx is None: | |
# file does not exists | |
raise web.HTTPError(404, f"File {path!r} does not exist") | |
# index successfully created | |
self.log.info("Request for Y document '%s' with room ID: %s", path, idx) | |
data = json.dumps( | |
{ | |
"format": format, | |
"type": content_type, | |
"fileId": idx, | |
"sessionId": SERVER_SESSION, | |
} | |
) | |
self.set_status(201) | |
return self.finish(data) |
It seems to me that going this way would solve the problem with getRoomName
which you documented in a fixme:
// FIXME we have issue with that key in case of file rename
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @krassowski for the review.
I thought about re-using the endpoint. But the issue is the overhead. It will be an overkill to call it at each execution request.
Alternatively, I could set up a cache based on a WeakMap<NotebookModel, roomID>
and discard the entry if the execution is refused due to improper input parameters. What do you think?
); | ||
const cellId = cell.model.sharedModel.getId(); | ||
|
||
// jupyverse case - it is undefined in jupyter-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make it part of the document state? I opened:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a more efficient way than using a cache as proposed in #307 (comment)
7c256ae
to
7f5ec20
Compare
@davidbrochart @krassowski for the roomID, finally I went to store it directly in the shared document state. Sharing it there allows to retrieve it quickly and avoid the need to reconstruct it from its part (easier maintenance). |
@davidbrochart could you confirm that the latest version will work with jupyverse? |
I will. |
Then I think we should acknowledge this is now part of the document state, and merge jupyter-server/jupyter_ydoc#198. |
I'm realizing that what you set in the shared document's state is the |
} | ||
} | ||
|
||
async function requestServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this logic. I understand it's a way to have input requests and widgets to work, but the whole system is designed so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension.
There are other ways to support inputs and widgets, discussed in jupyter-server/jupyter_ydoc#227 (comment). They involve using CRDTs so that they are fully integrated into jupyter-collaboration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension
@echarles @fcollonval what are your plans with respect to jupyter-server-nbmodel
?
There are other ways to support inputs and widgets
I think that stdin box is so critical to the basic interface (as compared to ipywidgets) and different from other widgets that a special-case implementation make sense, assuming that it can work with page refresh. The implementation proposed in this PR does not work with page refresh, and the selling point of server-side execution is that the state is preserved even when the browser gets closed/disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it's a way to have input requests and widgets to work
Only to have the input requests - widgets live their life with their renderer and the creation of comms as before as the frontend still connect to the kernel through websocket.
but the whole system is designed so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension.
That part is definitely opinionated. It only highlights the need to define a standard way to deal with input. I have no problem changing that part. A possibility to allow testing of various approach could be to remove the modification in the cell executor that are specific to jupyter-server-nbmodel
. That extension will then provide its own executor independently of collaboration.
what are your plans with respect to jupyter-server-nbmodel?
our plan is to deprecate it as soon as there is an positive way to integrate that logic here or in jupyter-server (depending on the consensus).
this PR does not work with page refresh
Good point, I'll try to cover that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is definitely opinionated. It only highlights the need to define a standard way to deal with input. I have no problem changing that part. A possibility to allow testing of various approach could be to remove the modification in the cell executor that are specific to
jupyter-server-nbmodel
. That extension will then provide its own executor independently of collaboration.
I think it would make sense indeed to define an API for handling inputs, as you did for cell execution. Correct me if I'm wrong, but the way it is handled in jupyter-server-nbmodel
doesn't support state recovery? For instance if I run a cell with a = input()
, close my browser, and reopen it, I don't have the input widget anymore and my notebook is stuck (I can't run a cell anymore). For me this highlights the need for a better solution based on CRDTs.
Having an API for input handling would allow to provide a CRDT-based input widget.
Edit: but I am fine with going forward with |
The issue with
I'd like to not use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This might be a
jupyter_server_nbmodel
issue but when executing a cell which takes longer I see that server gets stuck (execution counter does not get updated) and even static files (kernel icons) do not load:
Reloading the page is also stuck until the computation completes.
Calling input()
also blocks everything for me, including execution in other notebooks. Is there maybe a requirement on specific version of jupyter-server or Python or something that is not defined in dependencies of jupyter_server_nbmodel
?
- Should we update documentation on server-side execution? I think it would be outdated once this PR is merged:
jupyter-collaboration/docs/source/configuration.md
Lines 33 to 49 in adb51b0
There is an experimental feature that is currently only supported by the | |
[Jupyverse](https://github.com/jupyter-server/jupyverse) server | |
(not yet with [jupyter-server](https://github.com/jupyter-server/jupyter_server), | |
see the [issue #900](https://github.com/jupyter-server/jupyter_server/issues/900)): | |
server-side execution. With this, running notebook code cells is not done in the frontend through | |
the low-level kernel protocol over WebSocket API, but through a high-level REST API. Communication | |
with the kernel is then delegated to the server, and cell outputs are populated in the notebook | |
shared document. The frontend gets these outputs changes and shows them live. What this means is | |
that the notebook state can be recovered even if the frontend disconnects, because cell outputs are | |
not populated frontend-side but server-side. | |
This feature is disabled by default, and can be enabled like so: | |
```bash | |
pip install "jupyterlab>=4.2.0b0" | |
pip install "jupyverse[jupyterlab, auth]>=0.4.2" | |
jupyverse --set kernels.require_yjs=true --set jupyterlab.server_side_execution=true | |
``` |
If --YDocExtension.server_side_execution True
is required, I think it would need to be documented. JupyterLab 4.2 and Notebook 7.2 could be mentioned as shared requirements for both jupyverse and the jupyter_server_nbmodel
extension.
if (response.status === 300) { | ||
let replyUrl = response.headers.get('Location') || ''; | ||
|
||
if (!replyUrl.startsWith(settings.baseUrl)) { | ||
replyUrl = URLExt.join(settings.baseUrl, replyUrl); | ||
} | ||
const { parent_header, input_request } = await response.json(); | ||
// TODO only the client sending the snippet will be prompted for the input | ||
// we can have a deadlock if its connection is lost. | ||
const panel = new Panel(); | ||
panel.addClass('jp-OutputArea-child'); | ||
panel.addClass('jp-OutputArea-stdin-item'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some implicit assumptions on the jupyter-server-nbmodel
implementation. I think this is acceptable but I would suggest that the code is re-organised:
- can you move the stdin logic into a separate function and replace the if-else sequence with high-level switch-case to make the overall logic easy to understand at a first glance?
- the
Stdin
creation largely duplicates code from core JupyterLab. Ideally we would be able to reduce the duplication as much as possible; the difference I think is that the core is built around the kernel future objects. I wonder if we can implement the future here and just pass it to the logic in JupyterLab core. This could be done in a follow-up PR if we need to make changes in core to enable this, but lets discuss/investigate a bit before merging.
} | ||
} | ||
|
||
async function requestServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension
@echarles @fcollonval what are your plans with respect to jupyter-server-nbmodel
?
There are other ways to support inputs and widgets
I think that stdin box is so critical to the basic interface (as compared to ipywidgets) and different from other widgets that a special-case implementation make sense, assuming that it can work with page refresh. The implementation proposed in this PR does not work with page refresh, and the selling point of server-side execution is that the state is preserved even when the browser gets closed/disconnected.
Thank you both @davidbrochart and @krassowski for testing and reviewing. When thinking at this with @echarles , we wanted to address primary jupyter-server/jupyter_server#900 while keeping the kernel and the document model as decouple as possible. The concern we have with adding the input or the pending requests within the document model is that it creates an issue with synchronizing the kernel state and the document model. In particular for the input, it seems to us that this will be very complex to resolve if the kernel is connected to a notebook and a console (or to multiple notebooks). For example, if the input snippet is sent from the console. Should the pending input kernel state be reflected in the notebook? I'm wondering if a better path preserving as much as possible the decoupling by requesting the kernel state could be possible. |
I agree that adding requests to the shared document is the beginning of the kernel protocol leaking into CRDTs, which is exactly what we want to avoid. EDIT: let's call the |
A few times I encountered an issue when I executed a cell but the previous version of code was executed. This is because jupyter-server-nbmodel uses |
Debugging this a little bit I see that the following lines in with ycell.doc.transaction():
del ycell["outputs"][:]
ycell["execution_count"] = None I see that |
Ah, I missed jupyter/jupyter_client#1023 - sorry! |
@krassowski Thx for trying. Do you still block with jupyter/jupyter_client#1023? |
I think you need jupyter-server/jupyter_ydoc#201. BTW @fcollonval @krassowski I opened jupyter-server/jupyter_ydoc#233 to give an idea of what a truly collaborative and recoverable input widget would use. |
No, it is all working great after picking up jupyter/jupyter_client#1023. The bugs I now see are:
|
Improve notebook cell server-side executor Fix for testing drive Allow to request a document from its document_id/room_id Add documentation Rename state room_id to document_id Don't include custom logic for jupyter_server_nbmodel
d73e386
to
a207d33
Compare
the latest pushed commit a207d33 reduced the changes to a minimum as a dedicated cell executor is going to be part of datalayer/jupyter-server-nbmodel#14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this was waiting on my review.
Working on https://github.com/datalayer/jupyter-server-nbmodel, we are facing issues with the current code.
Code changes
document_id
as state of the shared document model.onCellExecuted
andonCellExecutionScheduled