Skip to content

Latest commit

 

History

History
549 lines (475 loc) · 25.5 KB

config_reference.md

File metadata and controls

549 lines (475 loc) · 25.5 KB

Configuration Reference

The Vega is highly modularized. The search space, search algorithm, and pipeline can be built through configuration. To run the Vega application is to load the configuration file and complete the AutoML process according to the configuration, as shown in the following figure.

import vega


if __name__ == "__main__":
    vega.run("./main.yml")

The following describes the configuration items in the main.yml file in detail.

1. Overall structure

The configuration of the vega can be divided into two parts:

  1. The general configuration item is used to set common and common configuration items, such as the output path and log level.
  2. Pipeline configuration, including the following two parts:
    1. Pipeline definition. The configuration item name is pipeline, which is a list that contains all steps in the pipeline.
    2. Defines each step in Pipeline. The configuration item name is the name of each step defined in Pipeline.
general:
    # general configuration

# Defining a Pipeline.
pipeline: [my_nas, my_hpo, my_data_augmentation, my_fully_train]

# defines each step. Refer to the following sections for details about
my_nas:
    # NAS configuration

my_hpo:
    # HPO configuration

my_data_augmentation:
    # Data augmentation configuration

my_fully_train:
    # fully train configuration

The following describes each configuration item in detail.

2. Public configuration items

The following public configuration items can be configured:

Configuration Item Description
local_base_path Working path. Each time when the system is running, a subfolder with time information (task id) is generated in the path. In this way, the output of multiple running is not overwritten. The task id subfolder contains two subfolders: output and worker. The output folder stores the output data of each step in the pipeline, and the worker folder stores temporary information.
In the clustered scenario, this path needs to be set to an EFS path that can be accessed by each computing node, and is used by different nodes to share data.
backup_base_path Backup path. This parameter is used in the cloud channel environment or cluster environment. The output and task files in the local path are backed up to this path.
timeout Worker timeout interval, in hours. If the task is not completed within the interval, the worker is forcibly terminated. The unit is hour. The default value is 10.
devices_per_job Number of GPUs used by each worker in the search phase, -1 means that one worker uses all GPUs of the node, 1 means one worker uses one GPU, 2 means one worker uses two GPUs, and so on.
logger.level Log level, which can be set to debug | info | warn | error | critical. default level is info.
cluster.master_ip In the cluster scenario, this parameter needs to be set to the IP address of the master node.
cluster.listen_port In the cluster scenario, you need to pay attention to this parameter. If port 8000 is occupied, you need to adjust the monitoring port.
cluster.slaves In the cluster scenario, this parameter needs to be set to the IP address of other nodes except the master node.
general:
    task:
        local_base_path: "./tasks"
        backup_base_path: ~
    worker:
        devices_per_job: -1
    logger:
        level: info
    cluster:
        master_ip: ~
        listen_port: 8000
        slaves: []

3. NAS configuration items

NAS configuration items include:

Configuration Item Description
pipe_step Step type. The value is fixed to NasPipeStep.
search_algorithm Search algorithm configuration item. For details, see the configuration of each NAS algorithm.
search_space For details about the definition of the search space, see each NAS algorithm.
trainer Trainer configuration information. For details, see the Trainer Configuration.
dataset Dataset configuration. For details, see the Dataset Configuration.

The NAS algorithm mentioned above includes: Prune-EA, Quant-EA,SM-NAS (Coming soon), CARS, Segmentation-Adelaide-EA, SR-EA, ESR-EA

The following is the configuration of the BackboneNas algorithm:

my_nas:
    pipe_step:
        type: NasPipeStep
    search_algorithm:       # search algorithm configuration item. This item must be configured in steps such as NAS.
        type: BackboneNas   # Search algorithm type.
        codec: BackboneNasCodec # The supported search algorithm codec is BackboneNas
        policy:                 # For details about the supported search algorithms, see the related algorithm documents.
            num_mutate: 10
            random_ratio: 0.2
        range:
            max_sample: 100
            min_sample: 10
    search_space:                       # search space in the related algorithm document, this item must be configured in steps such as Nas.
        type: SearchSpace
        modules: ['backbone', 'head']   # Modules are used to describe how to combine a network.
        backbone:                       # Each module has a configuration item
            ResNetVariant:              # For details, see the description of each algorithm. 
                base_depth: [18, 34, 50, 101]
                base_channel: [32, 48, 56, 64]
                doublechannel: [3, 4]
                downsample: [3, 4]
        head:
            LinearClassificationHead:
                num_classes: [10]
    trainer:
        type: Trainer
    dataset:
        type: Cifar10

The optional models of the search_space configuration item are as follows:

module Optional Description Algorithm Reference
backbone PruneResNet ResNet variant network, which is used to support the prune operation. ref
backbone QuantResNet ResNet variant network, which is used to support quantization operations. ref
backbone ResNetVariant The ResNet variant network is used to support architecture adjustment operations such as down-sampling point adjustment. ref
head LinearClassificationHead Network classification layer used to classify tasks, which can be concatenated with ResNetVariant.
head CurveLaneHead The CurveLaneHead detection head is used to detect the lane.
neck FeatureFusionModule Indicates the feature dashamid network in the roadway detection task.
detector AutoLaneDetector AutoLaneDetector detection network in the roadway detection task.
super_network DartsNetwork Super network structure in the Darts algorithm. ref
super_network CARSDartsNetwork Super network structure in the CARS algorithm. ref
custom AdelaideFastNAS Indicates the user-defined network structure in the AdelaideFastNAS algorithm. ref
custom MtMSR Indicates the user-defined network structure in the MtMSR algorithm. ref

4. HPO configuration items

HPO refers to the optimization of model training running parameters. It does not involve network architecture parameters. The searchable items are as follows:

  1. Batch size of the dataset.
  2. Optimization method and related parameters.
  3. Learning rate.
  4. Momentum.

The HPO configuration items are as follows:

Configuration Item Description
pipe_step The value is fixed at NasPipeStep.
hpo Configure the type and domain_space parameters. The former defines the HPO algorithm to be used. For details, see the HPO. The latter defines the hyperparameter information to be searched for.
trainer Trainer configuration information. For details, see the Trainer Configuration.
dataset Dataset configuration. For details, see the Dataset Configuration.
evaluator evaluator information. Please refer to each HPO algorithm example or Benchmark configuration.
The HPO configuration of the ASHA algorithm is as follows for reference:
my_hpo:
    pipe_step:
        type: NasPipeStep
    hpo:
        type: AshaHpo
        policy:
            total_epochs: 81
            config_count: 40
        hyperparameter_space:
            hyperparameters:
                -   key: dataset.batch_size
                    type: INT_CAT
                    range: [8, 16, 32, 64, 128, 256]
                -   key: trainer.optim.lr
                    type: FLOAT_EXP
                    range: [0.00001, 0.1]
                -   key: trainer.optim.type
                    type: STRING
                    range: ['Adam', 'SGD']
                -   key: trainer.optim.momentum
                    type: FLOAT
                    range: [0.0, 0.99]
            condition:
                -   key: condition_for_sgd_momentum
                    child: trainer.optim.momentum
                    parent: trainer.optim.type
                    type: EQUAL
                    range: ["SGD"]
    model:
        model_desc:
            modules: ["backbone", "head"]
            backbone:
                base_channel: 64
                downsample: [0, 0, 1, 0, 1, 0, 1, 0]
                base_depth: 18
                doublechannel: [0, 0, 1, 0, 1, 0, 1, 0]
                name: ResNetVariant
            head:
                num_classes: 10
                name: LinearClassificationHead
                base_channel: 512
    dataset:
        type: Cifar10
    trainer:
        type: Trainer
    evaluator:
        type: Evaluator
        gpu_evaluator:
            type: GpuEvaluator
            metric:
                type: accuracy

5. Data-Agumentation configuration item

The configuration of data augmentation includes:

Configuration Item Description
pipe_step The value is fixed at NasPipeStep.
hpo Currently, only the HYPERLINK "../algorithms/pba.md" PBA algorithm is supported. The value is fixed to PBAHpo. For details, see the PBA.
trainer Trainer configuration information. For details, see the Trainer Configuration.
dataset Dataset configuration. For details, see the Dataset Configuration.

The following shows the configuration of the PBA algorithm for reference:

my_data_augmentation:
    pipe_step:
        type: NasPipeStep
    dataset:
        type: Cifar10
    hpo:
        type: PBAHpo
        each_epochs: 3
        config_count: 16
        total_rungs: 200
        transformers:
            Cutout: True
            Rotate: True
            Translate_X: True
            Translate_Y: True
            Brightness: True
            Color: True
            Invert: True
            Sharpness: True
            Posterize: True
            Shear_X: True
            Solarize: True
            Shear_Y: True
            Equalize: True
            AutoContrast: True
            Contrast: True
    trainer:
        type: Trainer
    evaluator:
        type: Evaluator
        gpu_evaluator:
            type: GpuEvaluator
            metric:
                type: accuracy

6. Fully Train Configuration

Full training is used to train network models. The configuration items are as follows:

Configuration Item Description
pipe_step The value is fixed at FullyTrainPipeStep.
models_folder The directory where the model description file to be trained is located. The file name format in this directory is: model_desc_<ID>.json, where the ID is a number, and these models will be trained in parallel. This option is mutually exclusive with the parameter "model" and has priority to "model".
trainer Trainer configuration information. For details, see the Trainer Configuration.
dataset Dataset configuration. For details, see the Dataset Configuration.
model Model information. For details, see the Trainer Configuration.
model_desc_file The model description file
my_fully_train:
    pipe_step:
        type: FullyTrainPipeStep
        # models_folder: ~
    trainer:
        type: Trainer
    model:
        model_desc_file: "/models/model_desc.json"
    dataset:
        type: Cifar10

7. Trainer configuration item

In each of the preceding pipeline steps, the configuration item trainer is provided. You can configure the basic trainer and extended trainer. The basic configuration information of the trainer is as follows:

Configuration Item Description
type Trainer, or algorithm extension trainer. For details, see the related algorithm document.
epochs Total epochs
optim Optimizers and Parameters
lr_scheduler lr scheduler and parameters
loss Loss and Parameters
metric Metrics and Parameters
distributed Whether to enable Horovod for fully train. After enabling Horovod, the trainer will use all computing resources in the Horovod cluster to train the specified network model. To start horovod, you must set the model option.
model_desc Model description, which is mutually exclusive with model_desc_file. model_desc_file takes precedence over model_desc_file. And the parameter shuffle of the dataset must be set to False.
model_desc_file File where the model description information is located. This parameter is mutually exclusive with model_desc, and model_desc_file takes priority over model_desc_file.
hps_file Hyper-parameter file
pretrained_model_file Pre-trained model file

The following is an example of loading the Torchvision model for training:

    trainer:
        type: Trainer
        epochs: 160
        optim:
            type: Adam
            params:
                lr: 0.1
        lr_scheduler:
            type: MultiStepLR
            params:
                milestones: [75, 150]
                gamma: 0.5
        metric:
            type: accuracy
        loss:
            type: CrossEntropyLoss
        distributed: False
    dataset:
        type: Imagenet
    model:
        model_desc:
            modules: ['backbone', 'head']
            backbone:
                ResNetVariant:
                    base_depth: [18, 34, 50, 101]
                    base_channel: [32, 48, 56, 64]
                    doublechannel: [3, 4]
                    downsample: [3, 4]
            head:
                LinearClassificationHead:
                    num_classes: [10]
        # model_desc_file: ~
        # hps_file: ~
        # pretrained_model_file: ~

As shown in the preceding example, in addition to the models defined by Vega, you can also load the TorchVision Model. The following models are supported. For details, see the official desc.

module Optional
torch_vision_model vgg11, vgg13, vgg16, vgg19, vgg11_bn, vgg13_bn, vgg16_bn, vgg19_bn, squeezenet1_0, squeezenet1_1, shufflenetv2_x0.5, shufflenetv2_x1.0, resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, mobilenet_v2, mnasnet0_5, mnasnet1_0, inception_v3_google, googlenet, densenet121, densenet169, densenet201, densenet161, alexnet, fasterrcnn_resnet50_fpn, fasterrcnn_resnet50_fpn_coco, keypointrcnn_resnet50_fpn_coco, maskrcnn_resnet50_fpn_coco, fcn_resnet101_coco, deeplabv3_resnet101_coco, r3d_18, mc3_18, r2plus1d_18

8. Dataset Reference

Each pipeline involves dataset configuration. Vega classifies datasets into three types: train, val, and test. The three types of datasets can be configured independently. In addition, transform can be configured in Dataset. The following is a configuration example of the Cifar10 dataset:

    dataset:
        type: Cifar10
        common:
            data_path: ~            # configuration data set is located.
            batch_size: 256
            num_workers: 4
            imgs_per_gpu: 1
            train_portion: 0.5
            shuffle: false
            distributed: false
        train:
            transforms:
                - type: RandomCrop
                size: 32
                padding: 4
                - type: RandomHorizontalFlip
                - type: ToTensor
                - type: Normalize
                mean:
                    - 0.49139968
                    - 0.48215827
                    - 0.44653124
                std:
                    - 0.24703233
                    - 0.24348505
                    - 0.26158768
        val:
            transforms:
                - type: ToTensor
                - type: Normalize
                mean:
                    - 0.49139968
                    - 0.48215827
                    - 0.44653124
                std:
                    - 0.24703233
                    - 0.24348505
                    - 0.26158768
        test:
            transforms:
                - type: ToTensor
                - type: Normalize
                mean:
                    - 0.49139968
                    - 0.48215827
                    - 0.44653124
                std:
                    - 0.24703233
                    - 0.24348505
                    - 0.26158768

8.1 内置数据集

Vega provides the following common data sets:

Name Description Data Source
Cifar10 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images download
Cifar100 The CIFAR-100 is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class download
Minist The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples download
COCO The COCO is a large-scale object detection, segmentation, and captioning dataset, about 123K images and 886K instances download
Div2K Div2K is a super-resolution architecture search database, containing 800 training images and 100 validiation images download
Imagenet The ImageNet is an image database organized according to the WordNet hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images download
Fmnist Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. download
Cityscapes The Cityscape is a large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. download
Cifar10TF The CIFAR-10-bin dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images download
Div2kUnpair DIV2K dataset: DIVerse 2K resolution high quality images as used for the challenges @ NTIRE (CVPR 2017 and CVPR 2018) and @ PIRM (ECCV 2018) download
  1. Cifar10 Default Configuration

    data_path: ~            # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/cifar10
    batch_size: 256         # batch size
    num_workers: 4          # the worker number to load the data
    shuffle: false          # if True, will shuffle, defaults to False
    distributed: false      # whether to use distributed train
    train_portion: 0.5      # the ratio of the train data split from the initial train data
  2. Cifar100 Default Configuration

    data_path: ~            # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/cifar100
    batch_size: 1           # batch size
    num_workers: 4          # the worker number to load the data
    shuffle: true           # if True, will shuffle, defaults to False
    distributed: false      # whether to use distributed train
  3. Cityscapes Default Configuration

    root_path: ~            # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/Cityscapes
    list_path: 'img_gt_train.txt'   # the name of the txt file
    batch_size: 1           # batch size
    mean: 128               # the parameter mean for transform
    ignore_label: 255       # the label to ignore
    scale: True             # if scale is true, the asptio will be keep when transform
    mirrow: True            # whether to use mirrow for transform  
    rotation: 90            # the rotation value
    crop: 321               # the crop size
    num_workers: 4          # the worker number to load the data
    shuffle: False          # if True, will shuffle, defaults to False
    distributed: True       # whether to use distributed train
    id_to_trainid: False    # change the random id to continious id,if true, a dict should be obtain
  4. DIV2K Default Configuration

    root_HR: ~              # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/DIV2K/div2k_train/hr
    root_LR: ~              # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/DIV2k/div2k_train/lr
    batch_size: 1           # batch size
    num_workers: 4          # the worker number to load the data
    upscale: 2              # the upscale for super resolution
    subfile: !!null         # whether to use subfile,Set it to None by default
    crop: !!null            # the crop size,Set it to None by default
    shuffle: false          # if True, will shuffle, defaults to False
    hflip: false            # whether to use horrizion flip
    vflip: false            # whether to use vertical flip
    rot90: false            # whether to use rotation
    distributed: True       # whether to use distributed train
  5. FashionMnist Default Configuration

    data_path: ~            # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/fmnist
    batch_size: 1           # batch size
    num_workers: 4          # the worker number to load the data
    shuffle: true           # if True, will shuffle, defaults to False
    distributed: false      # whether to use distributed train
  6. Imagenet Default Configuration

    data_path: ~            #  the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/ImageNet
    batch_size: 1           #  batch size
    num_workers: 4          #  the worker number to load the data
    shuffle: true           #  if True, will shuffle, defaults to False
    distributed: false      #  whether to use distributed train
  7. Mnist Default Configuration

    data_path: ~            # the path of the dataset, default is None, MUST be set to a correct dataset PATH, such as /datasets/mnist
    batch_size: 1           # batch size
    num_workers: 4          # the worker number to load the data
    shuffle: true           # if True, will shuffle, defaults to False
    distributed: false      # whether to use distributed train

8.1 Built-in Transform

Currently, the following transforms are supported:

Transform Input Output
AutoContrast level img img
BboxTransform bboxes imge_shape scale_factor bboxes
Brightness level img img
Color level img img
Contrast level img img
Cutout length img img
Equalize level img img
ImageTransform scale img img img_shape pad_shape scale_factor
Invert level img img
MaskTransform masks pad_shape scale_factor padded_masks
Numpy2Tensor numpy tensor
Posterize level img img
RandomCrop_pair crop upscale img label img label
RandomHorizontalFlip_pair img label img label
RandomMirrow_pair img label img label
RandomRotate90_pair img label img label
RandomVerticallFlip_pair img label img label
Rotate level img img
SegMapTransform scale img img
Sharpness level img img
Shear_X level img img
Shear_Y level img img
Solarize level img img
ToPILImage_pair img1 img2 img1 img2
ToTensor_pair img1 img2 tensor1 tensor2
Translate_X level img img
Translate_Y level img img