diff --git a/ExecutionBroker.tex b/ExecutionBroker.tex index 464f26c..b6fb256 100644 --- a/ExecutionBroker.tex +++ b/ExecutionBroker.tex @@ -520,14 +520,9 @@ \subsection{Execution worker} \codeword{phase} to \codeword{READY} to indicate that the all the resources are ready and the \workerjob{} is waiting to start. -For a normal \execplan{} with no triggers, the \workerjob{} will be started at -the \codeword{starttime} declared in the \execoffer{}. - -For an \execplan{} with a start trigger, the \workerjob{} will stay at \codeword{phase} -\codeword{READY} until the start trigger is received. - -When the \workerjob{} start executing, the \codeword{phase} is changed to \codeword{RUNNING} -to indicate that the \workerjob{} is running. +The \execworkerclass{} will then wait until the \codeword{starttime} declared in the \execoffer{} +at which point it will start executing the \workerjob{} and change the \workerjob{} \codeword{phase} +to \codeword{RUNNING}. When the \workerjob{} finishes executing, because the \dockercontainer{} finished executing, the user closed their \jupyternotebook, or the \codeword{maxduration} was reached, @@ -1596,436 +1591,6 @@ \section{Date and time} Technically the \datamodel{} allows an array of values for the \codeword{datetime} section, but this would impose unnecessary complexity on the client for no real gain in user experience. -\section{Triggers and Actions} -\label{triggers-actions} - -\subsection{Triggers} -\label{triggers} - -The \codeword{triggers} section of the \datamodel{} defines triggers that -enable an external system to control the state of an \execworkerclass{} \job{}. - -The core \datamodel{} for \codeword{triggers} uses the \codeword{type} and -\codeword{spec} pattern to support different types of \codeword{triggers} -using \datamodel{} extensions. - -\subsubsection{HTTP trigger} -\label{http-trigger} - -The \executionbroker{} specification includes a \datamodel{} extension for -a \http{} \webservice{} trigger which can be called by an external system. - -The following example asks the \execbrokerclass{} service to set up a \http{} \webservice{} -endpoint that expects the \codeword{POST} data to contain a \yaml{} document with the -following fields: - -\begin{lstlisting}[] -trigger: - colour: green -\end{lstlisting} - -A \http{} \codeword{POST} to this endpoint will trigger an action depending the value of -\codeword{trigger.colour} in the \yaml{} document that it receives. - -\begin{lstlisting}[] -# ExecutionBroker client request. -.... -triggers: -- name: "trigger-001" - type: "https://www.purl.org/ivoa.net/trigger-types/http-trigger" - spec: - method: "POST" - content-type: "yaml" - conditions: - - field: "trigger.colour" - value: "GREEN" - action: "start" - - field: "trigger.colour" - value: "RED" - action: "cancel" -\end{lstlisting} - -In this case the action depends on the value of the \codeword{trigger.colour} field in the \codeword{POST} data. -\begin{itemize} - \item If the \codeword{trigger.colour} value is \codeword{GREEN}, then start this \job{}. - \item If the \codeword{trigger.colour} value is \codeword{RED}, then cancel this \job{}. -\end{itemize} - -Defining a \codeword{start} action in the \codeword{triggers} section will modify the effect of the -\codeword{starttime} declared in the \codeword{datetime} section. -The \execworkerclass{} service will not automatically start the \job{} when the specified \codeword{starttime} is reached. -Instead the \execworkerclass{} service will wait until the start action has been triggered before it starts the \job{}. - -\begin{itemize} - \item If the trigger is called before the specified \codeword{starttime} is reached, the \execworkerclass{} will - wait until the \codeword{starttime} is reached before starting the \job{}. - \item If the \codeword{starttime} is reached before the trigger has been called, the \execworkerclass{} will wait - until the trigger is called before starting the \job{}. - \item If the trigger is called within the \codeword{starttime} range, the \execworkerclass{} service will start \job{}. - \item If the \codeword{starttime} range expires before the trigger has been called, the \execworkerclass{} will cancel the \job{}. - \item If the trigger is called after the \codeword{starttime} range, it has no effect. The \job{} should already have been cancelled. -\end{itemize} - -In effect, the \codeword{starttime} range acts as a constraint on the action of the trigger. -Preventing it from starting the \job{} too early or too late. -The \execworkerclass{} service is responsible for allocating the required compute and storage resources -before the beginning of the \codeword{starttime}, -making sure that the \job{} is ready to start as soon as the trigger is called. -If the \codeword{starttime} range expires before the trigger has been called, the resources are released as normal -when the \job{} is cancelled. - -\subsection{Actions} -\label{actions} - -The \codeword{actions} part of the \datamodel{} enables a client to ask the \execworkerclass{} service -to perform user defined actions at specific points in the lifecycle of a \job{}. - -The core \datamodel{} for actions uses the \codeword{type} and \codeword{spec} pattern to -support different types of \codeword{actions} using \datamodel{} extensions. - -The \executionbroker{} specification includes \datamodel{} extensions for two types of \codeword{actions}, -one for sending an email and one for calling a HTTP \webservice{}. - -\subsubsection{Email action} -\label{email-action} - -The following definition asks the \execworkerclass{} service to send an email to the user -when the \codeword{status} of their \jupyternotebook{} \job{} changes to \codeword{RUNNING}. - -\begin{lstlisting}[] -.... -actions: -- name: "action-001" - status: - - 'RUNNING' - type: "https://www.purl.org/ivoa.net/action-types/email-action" - spec: - to: - - "user@example.org" - content-type: "text/html" - content: | - This is an automated email to let you know your Jupyter notebook session is now ready to use. -
- You can use this link to access your Jupyter notebook. -
- You can use this link to check the ExecutionWorker job status. -\end{lstlisting} - -Note the difference in the terminology used to describe the \job{} status. -From the user's perspective their notebook is \textit{'running'} when they login and click the -\textit{'run'} button. - -From the \execworkerclass{} service's perspective, the \job{} \codeword{status} changes to \codeword{RUNNING} -once the compute resources have been allocated, the data staging has been completed -and the \job{} \codeword{starttime} is reached. -In many cases, the \jupyternotebook{} service \codeword{endpoint} will not be available until the -\execworkerclass{} \job{} is \codeword{RUNNING}. - -In the simple case with no data staging, the \execworkerclass{} \job{} \codeword{status} may change to -\codeword{RUNNING} almost immediately, in which case the notification email is not necessary. - -However, in a more complex case where the the \execworkerclass{} service needs to do a lot of work -allocating the resources and staging the data, there may be a significant delay before for the -\execworkerclass{} \job{} \codeword{status} changes to \codeword{RUNNING}. -In which case, being able send the user an email when their session -is ready improves the user experience. - -\subsubsection{HTTP action} -\label{http-action} - -The following definition asks the \execworkerclass{} service to \codeword{POST} content from a \yaml{} template -to a HTTP \webservice{} at \codeword{http://foo.example.org/update} when the status of this \job{} changes. - -\begin{lstlisting}[] -.... -actions: -- name: "action-002" - status: - - 'RUNNING' - - 'COMPLETED' - - 'FAILED' - - 'CANCELLED' - type: "https://www.purl.org/ivoa.net/action-types/http-action" - spec: - method: "POST" - endpoint: "http://foo.example.org/update" - content-type: "application/yaml" - content: | - date: {{system.datetime}} - job: - ident: {{job.ident}} - status: {{job.status}} -\end{lstlisting} - -When the status of this \job{} changes to one of the specified values, the -\execworkerclass{} service will \codeword{POST} the document described in the -\codeword{content} template to the HTTP \codeword{endpoint}, filling in the \codeword{\{\{...\}\}} -markers in the template with values from the \datamodel{} representing the \job{} -at that point in time. - -In this example, the \codeword{\{\{...\}\}} markers in the \codeword{content} template will be filled -in with the current time, the \job{} \codeword{ident} and the \job{} \codeword{status}. - -In a normal execution sequence, this callout would be called twice. Once when the \job{} \codeword{status} -is set to \codeword{RUNNING}; -\begin{lstlisting}[] -POST /update HTTP/1.1 -Host: foo.example.org -Content-Type: application/yaml - -date: "2023-10-16T05:55" -job: - ident: "0174e0e4-7e74-40bb-84f6-88d66dd7845f" - status: "RUNNING" -\end{lstlisting} - -and again when the \job{} \codeword{status} is set to \codeword{COMPLETED}. -\begin{lstlisting}[] -POST /update HTTP/1.1 -Host: foo.example.org -Content-Type: application/yaml - -date: "2023-10-16T06:12" -job: - ident: "0174e0e4-7e74-40bb-84f6-88d66dd7845f" - status: "COMPLETED" -\end{lstlisting} - -\subsection{Linked worflow} -\label{linked-workflow} - -The following example describes how \codeword{trigger} and \codeword{action} blocks can be used to -set up a 2 step workflow containing step-a and step-b. - -The first part of setting up the workflow is to configure step-b with a \codeword{http-trigger} -that will wait for a specific value to be posted before starting the \job{}. - -\begin{lstlisting}[] -name: "step-b" -executable: - ... -resources: - ... -triggers: -- name: "trigger-001" - type: "https://www.purl.org/ivoa.net/trigger-types/http-trigger" - spec: - method: "POST" - content-type: "yaml" - conditions: - - name: "job.status" - value: "COMPLETED" - action: start - - name: "job.status" - value: "FAILED,CANCELLED" - action: cancel -\end{lstlisting} - -Note that the client doesn't set the endpoint location for the \codeword{trigger}, that will -come from the \execbrokerclass{} service when it makes an offer. -The \webservice{} endpoint location for the trigger may be different for each of the -offers in the \execbrokerclass{} service response. - -\begin{lstlisting}[] -.... -offers: -- name: "offer-001" - .... - triggers: - - name: "trigger-001" - type: "https://www.purl.org/ivoa.net/trigger-types/http-trigger" - spec: - method: "POST" - content-type: "yaml" - endpoint: "https://..../offer-001/trigger-001" - .... -\end{lstlisting} - -The second part of setting up the workflow is to configure a \codeword{http-action} -on step-a to call the \codeword{http-trigger} on step-b. - -\begin{lstlisting}[] -name: "step-a" -executable: - ... -resources: - ... -actions: -- name: "action-001" - status: - - 'RUNNING' - - 'COMPLETED' - - 'FAILED' - - 'CANCELLED' - type: "https://www.purl.org/ivoa.net/action-types/http-action" - spec: - method: "POST" - endpoint: "https://..../offer-001/trigger-001" - content-type: "yaml - content: | - date: {{system.date}} - job: - ident: {{job.ident}} - status: {{job.status}} -\end{lstlisting} - -The \codeword{starttime} on both step-a and step-b can be set to a range that starts today and lasts for a day. -This will ensure that even if the triggers don't get called, and neither of them is executed, both of them will -be cancelled and their resources released when the \codeword{starttime} range expires at the end of the day. - -\begin{lstlisting}[] -name: "step-a" -executable: - ... -resources: - ... -datetime: - - start: "2023-08-14/P1D" -\end{lstlisting} - -The client can also set up a \codeword{managed} storage resource in \vospace{} to transfer data -between the steps which is automatically created and deleted by the \execbrokerclass{} service. - -Adding a \codeword{managed} storage resource to step-a which points a location in a \vospace{} service -with a \codeword{maxlifetime} and \codeword{minlifetime} set to 1 day means that the \vospace{} location -will be automatically created when step-a starts, and automatically deleted 1 day after step-a completes. - -This gives step-b enough time to collect the results but also ensures that the storage resources -are eventually released after a day. - -\begin{lstlisting}[] -name: "step-a" -executable: - ... -resources: - .... - storage: - - name: "result-storage" - type: "https://www.purl.org/ivoa.net/storage-types/vospace-storage" - spec: - endpoint: "http://...." - path: "/step-a/results" - lifecycle: "managed" - lifetime: - min: "P1D" - max: "P1D" -\end{lstlisting} - -The data in this location can be made available to step-b using an \codeword{unmanaged} resource -that points to the same location in \vospace{}. - -\begin{lstlisting}[] -name: "step-b" -executable: - ... -resources: - .... - storage: - - name: "input-storage" - type: "https://www.purl.org/ivoa.net/storage-types/vospace-storage" - spec: - endpoint: "http://...." - path: "/step-a/results" - lifecycle: "unmanaged" -\end{lstlisting} - -If all goes to plan, when the \codeword{starttime} for step-a is reached, -the first \execworkerclass{} service will allocate the managed space in \vospace{}, -and then start the execution of step-a. - -The executable in step-a can access the storage location using an internal -volume mount that refers to the external storage location. - -\begin{lstlisting}[] -name: "step-a" -executable: - ... -resources: - .... - compute: - - name: "compute-001" - .... - volumes: - - name: "results-volume" - resource: "results-storage" - path: "/results" - mode: "rw" - .... - storage: - - name: "results-storage" - type: "https://www.purl.org/ivoa.net/storage-types/vospace-storage" - spec: - endpoint: "http://...." - path: "/step-a/results" - lifecycle: "managed" - lifetime: - min: "P1D" - max: "P1D" -\end{lstlisting} - -As far as the code inside the \executable{} is concerned, it writes its results to a -filesystem directory at \codeword{/results}. -The code inside the \executable{} does not need to know anything about the -details of where the data is stored. -The \execworkerclass{} service for step-a is responsible for ensuring that anything written to -\codeword{/results} in the local filesystem is transferred to \codeword{/step-a/results} -in the remote \vospace{} service during the \codeword{TEARDONW} phase of step-a. - -As with step-a, step-b is configured to mount the same \vospace{} location -as a filesystem directory. - -\begin{lstlisting}[] -name: "step-b" -executable: - ... -resources: - .... - compute: - - name: "compute-001" - .... - volumes: - - name: "input-volume" - resource: "input-storage" - path: "/inputs" - mode: "r" - .... - storage: - - name: "input-storage" - type: "https://www.purl.org/ivoa.net/storage-types/vospace-storage" - spec: - endpoint: "http://...." - path: "/step-a/results" - lifecycle: "unmanaged" -\end{lstlisting} - -As far as the code inside the \executable{} is concerned, it reads -its input data from a filesystem directory at \codeword{/inputs}. -The code inside the \executable{} does not need to know anything about the -details of where the data is stored. - -As soon as step-a starts, the \job{} status changes to \codeword{RUNNING}, -the \codeword{http-action} on this \execworkerclass{} will call the \codeword{http-trigger} -on the \execworkerclass{} running step-b. -This first call will have no effect, as the \codeword{http-trigger} -on step-b will only perform an action when the \codeword{job.status} -value is \codeword{COMPLETED}, \codeword{FAILED} or \codeword{CANCELLED}. - -When step-a completes and its status is updated to \codeword{COMPLETED}, -the \codeword{http-action} on step-a will make another call to step-b's \codeword{http-trigger}. -This time, the \codeword{job.status} value of \codeword{COMPLETED} will match the -criteria to start step-b. - -At this point, although step-a has completed, the managed space in \vospace{} is not deleted. -The managed resource has a \codeword{lifetime.min} of \codeword{P1D}, -which means that the space will not be deleted until a day after step-a has completed. -This allows sufficient time for step-b to execute, reading its input data -from the results of step-a. -The second \execworkerclass{} service runs step-b to completion as normal, -freeing up any resources it was allocated at the end. - -Finally, the \codeword{managed} \vospace{} storage location is automatically -deleted a day after the completion of step-a by the \execworkerclass{} service -that was assigned to manage its lifecycle. \pagebreak @@ -2207,13 +1772,6 @@ \subsection{Container with compute} An \dockercontainer{} with specific input and output data, and compute resources. ... -\subsection{Container triggering notebook} -\label{container-triggering-notebook} - -An \dockercontainer{} that notifies the user and launches a \jupyternotebook{} session -when the container finishes executing. -... - \subsection{Kubernetes Helm chart} \label{kubernetes-helm} @@ -2901,22 +2459,6 @@ \subsubsection{DateTime} .... \end{lstlisting} -\subsubsection{Triggers} -\label{datamodel-triggers} - -\begin{lstlisting}[] -.... -.... -\end{lstlisting} - -\subsubsection{Callouts} -\label{datamodel-callouts} - -\begin{lstlisting}[] -.... -.... -\end{lstlisting} - \subsection{Response} \label{datamodel-response}