This guide introduces how to run scikit-learn job on OpenPAI. The following contents show some basic scikit-learn examples, other customized scikit-learn code can be run similarly.
To run scikit-learn examples in OpenPAI, you need to prepare a job configuration file and submit it through webportal.
OpenPAI packaged the docker env required by the job for user to use. User could refer to DOCKER.md to customize this example docker env. If user have built a customized image and pushed it to Docker Hub, replace our pre-built image openpai/pai.example.sklearn
with your own.
Here're some configuration file examples:
{
"jobName": "sklearn-mnist",
"image": "openpai/pai.example.sklearn",
"taskRoles": [
{
"name": "main",
"taskNumber": 1,
"cpuNumber": 4,
"memoryMB": 8192,
"gpuNumber": 0,
"command": "cd scikit-learn/benchmarks && python bench_mnist.py"
}
]
}
{
"jobName": "sklearn-text-vectorizers",
"image": "openpai/pai.example.sklearn",
"taskRoles": [
{
"name": "main",
"taskNumber": 1,
"cpuNumber": 4,
"memoryMB": 8192,
"gpuNumber": 0,
"command": "pip install memory_profiler && cd scikit-learn/benchmarks && python bench_text_vectorizers.py"
}
]
}
For more details on how to write a job configuration file, please refer to job tutorial.
Since PAI runs PyTorch jobs in Docker, the trainning speed on PAI should be similar to speed on host.
We provide a stable docker image by adding the data to the image. If you want to use it, add stable
tag to the image name: openpai/pai.example.sklearn:stable
.