Skip to content

Latest commit

 

History

History
 
 

scikit-learn

scikit-learn on OpenPAI

This guide introduces how to run scikit-learn job on OpenPAI. The following contents show some basic scikit-learn examples, other customized scikit-learn code can be run similarly.

Contents

  1. scikit-learn MNIST digit recognition example
  2. scikit-learn text-vectorizers example

scikit-learn MNIST digit recognition example

To run scikit-learn examples in OpenPAI, you need to prepare a job configuration file and submit it through webportal.

OpenPAI packaged the docker env required by the job for user to use. User could refer to DOCKER.md to customize this example docker env. If user have built a customized image and pushed it to Docker Hub, replace our pre-built image openpai/pai.example.sklearn with your own.

Here're some configuration file examples:

{
  "jobName": "sklearn-mnist",
  "image": "openpai/pai.example.sklearn",
  "taskRoles": [
    {
      "name": "main",
      "taskNumber": 1,
      "cpuNumber": 4,
      "memoryMB": 8192,
      "gpuNumber": 0,
      "command": "cd scikit-learn/benchmarks && python bench_mnist.py"
    }
  ]
}

scikit-learn text-vectorizers example

{
  "jobName": "sklearn-text-vectorizers",
  "image": "openpai/pai.example.sklearn",
  "taskRoles": [
    {
      "name": "main",
      "taskNumber": 1,
      "cpuNumber": 4,
      "memoryMB": 8192,
      "gpuNumber": 0,
      "command": "pip install memory_profiler && cd scikit-learn/benchmarks && python bench_text_vectorizers.py"
    }
  ]
}

For more details on how to write a job configuration file, please refer to job tutorial.

Note:

Since PAI runs PyTorch jobs in Docker, the trainning speed on PAI should be similar to speed on host.

We provide a stable docker image by adding the data to the image. If you want to use it, add stable tag to the image name: openpai/pai.example.sklearn:stable.