Skip to content

ortemij/toloka-data-exchange-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toloka Data Exchange Sample

This guide is designed to explain how a client of Toloka can establish a data exchange process to store none of the meaningful data on Toloka's side and keep both task data and results on their side.

Basic concept

The meaningful data is stored on the client's server-side — Toloka stores only reference (keys) to it. When a toloker opens task UI in their web-browser, a direct request goes to the client's server, which checks permissions and serves the content. Before submitting a task, another request goes to the client's server, and the task result is stored there in exchange for a reference key. This key is set as a task result and stored in Toloka.

Toloka settings

To set up a project, look at toloka folder inside the repo. Copy-paste html.hbs and js.js files content to corresponding fields of interface setup stage. Set data specification as follows: input_key required string input-field, output_key required string output-field. Toloka will store the mentioned above references to the data there.

Server

The server is responsible for serving tasks content and storing labeling results. In this example, we implemented the server as a simple Node.js application on Express, look at server.js. For instance, we deployed it to Heroku. To get more details, please visit official documentation here.

Data

For this example, the data is three images stored on a filesystem of the server and accessible with corresponding keys: "0950a1", "a38c4e", "ea9ee6". For testing purposes, one more image is available with the key "undefined". To run tasks with these data, set up a pool in the Toloka project and upload an example file which contains three lines for three tasks with given keys and will be saved as input_values.input_key. As a result of doing the task, we ask a toloker to type any text into an input field. After submitting the task, this text will be sent to the server. The key from the server will replace the initial text and be saved to Toloka as output_values.output_key.

How it works step-by-step

  1. A toloker comes to Toloka and assigns a task from the project, for instance, one with "input_key": "0950a1".
  2. Toloka backend sends this data to the toloker's web-browser.
  3. The web-browser renders the interface and executes Task::onRender(); an URL for the image is changed to ${deployUrl}/data/0950a1?assignmentId=${assignmentId}, where ${deployUrl} is replaced to the real host of the server and ${assignmentId} to the real assignmentId associated for this toloker+task pair and unique in the platform; the web-browser sends a request to the URL.
  4. The server receives the request and calls a handler set for app.get('/data/:requestedKey', (req, res) => { ... }); in testing mode with assignmentId=undefined passed, the content of undefined.jpg will be served back, otherwise a verification process starts.
  5. The server calls Toloka API and asks for information by given assignment ID; if the request is successful, the server checks whether the assignment is active, it means that a toloker needs the content for doing their task; after that the server checks whether the assignment contains a task with the given key; only if all these checks passed, the requested content is served back.
  6. The toloker types any text into the input field and clicks Submit; under the hood, this click invokes Assignment::provideSolutions(); then the web-browser sends a request with raw solution data (with the typed text as well) to ${deployUrl}/result.
  7. The server receives the request and calls a handler set for app.put('/result', (req, res) => { ... }); since the request is cross-domain, the server has CORS configured for this path; the server logs the request and sends back some generated key associated with it; the logging makes sense only for this example, in real implementation should be changed to some storing procedures.
  8. The client-side code running in the toloker's web-browser receives the key and performs the replacement of the output for the solutions; as a result, the key is stored in Toloka backend instead of the typed text.

Results

Since Toloka stores only keys (or references) to the real data and results, the client should restore the entire dataset on their side. To make this happen, download the assignments from Toloka, and using input_key restore the input data, and using output_key restore the results of labeling.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published