This guide is designed to explain how a client of Toloka can establish a data exchange process to store none of the meaningful data on Toloka's side and keep both task data and results on their side.
The meaningful data is stored on the client's server-side — Toloka stores only reference (keys) to it. When a toloker opens task UI in their web-browser, a direct request goes to the client's server, which checks permissions and serves the content. Before submitting a task, another request goes to the client's server, and the task result is stored there in exchange for a reference key. This key is set as a task result and stored in Toloka.
To set up a project, look at toloka folder inside the repo. Copy-paste html.hbs
and js.js
files content to corresponding fields of interface setup stage. Set data specification as follows: input_key
required string input-field, output_key
required string output-field. Toloka will store the mentioned above references to the data there.
The server is responsible for serving tasks content and storing labeling results. In this example, we implemented the server as a simple Node.js application on Express, look at server.js. For instance, we deployed it to Heroku. To get more details, please visit official documentation here.
For this example, the data is three images stored on a filesystem of the server and accessible with corresponding keys: "0950a1"
, "a38c4e"
, "ea9ee6"
. For testing purposes, one more image is available with the key "undefined"
.
To run tasks with these data, set up a pool in the Toloka project and upload an example file which contains three lines for three tasks with given keys and will be saved as input_values.input_key
.
As a result of doing the task, we ask a toloker to type any text into an input field. After submitting the task, this text will be sent to the server. The key from the server will replace the initial text and be saved to Toloka as output_values.output_key
.
- A toloker comes to Toloka and assigns a task from the project, for instance, one with
"input_key": "0950a1"
. - Toloka backend sends this data to the toloker's web-browser.
- The web-browser renders the interface and executes
Task::onRender()
; an URL for the image is changed to${deployUrl}/data/0950a1?assignmentId=${assignmentId}
, where${deployUrl}
is replaced to the real host of the server and${assignmentId}
to the realassignmentId
associated for this toloker+task pair and unique in the platform; the web-browser sends a request to the URL. - The server receives the request and calls a handler set for
app.get('/data/:requestedKey', (req, res) => { ... })
; in testing mode withassignmentId=undefined
passed, the content ofundefined.jpg
will be served back, otherwise a verification process starts. - The server calls Toloka API and asks for information by given assignment ID; if the request is successful, the server checks whether the assignment is active, it means that a toloker needs the content for doing their task; after that the server checks whether the assignment contains a task with the given key; only if all these checks passed, the requested content is served back.
- The toloker types any text into the input field and clicks Submit; under the hood, this click invokes
Assignment::provideSolutions()
; then the web-browser sends a request with raw solution data (with the typed text as well) to${deployUrl}/result
. - The server receives the request and calls a handler set for
app.put('/result', (req, res) => { ... })
; since the request is cross-domain, the server has CORS configured for this path; the server logs the request and sends back some generated key associated with it; the logging makes sense only for this example, in real implementation should be changed to some storing procedures. - The client-side code running in the toloker's web-browser receives the key and performs the replacement of the output for the solutions; as a result, the key is stored in Toloka backend instead of the typed text.
Since Toloka stores only keys (or references) to the real data and results, the client should restore the entire dataset on their side. To make this happen, download the assignments from Toloka, and using input_key
restore the input data, and using output_key
restore the results of labeling.