A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, data transformation, model training, and so on. A component is analogous to a function, in that it has a name, parameters, return values, and a body.
Components are made up of two sets of code. Client code talks to api endpoints for submitting the job. Runtime code does the actual job specified for the component.
Components must also include a specification file in YAML format. The file includes information for Kubeflow to run the component, such as metadata and input/output specifications.
The last step is to dockerize the component code.
For an in-depth guide, take a look at their component specification.
- Click on the "Components" link in left hand navigation panel
- Click on "Upload a Component"
- Select a file to upload (Must be tar.gz or tgz format)
- This will be the compressed .yaml component specification
- Enter a name for the component; Otherwise a default will be given
Components are composed into a pipeline using the Kubeflow Pipelines SDK. Refer to the pipeline documentation for usage.
You can find the sample components in the Machine Learning Exchange catalog here