-
Notifications
You must be signed in to change notification settings - Fork 708
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add XPU accelerator Signed-off-by: Ashwin Vaidya <[email protected]> * Update changelog Signed-off-by: Ashwin Vaidya <[email protected]> * precommit Signed-off-by: Ashwin Vaidya <[email protected]> * Add documentation Signed-off-by: Ashwin Vaidya <[email protected]> --------- Signed-off-by: Ashwin Vaidya <[email protected]>
- Loading branch information
1 parent
3e64f09
commit b37284e
Showing
9 changed files
with
218 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
52 changes: 52 additions & 0 deletions
52
docs/source/markdown/guides/how_to/training_on_intel_gpus/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Training on Intel GPUs | ||
|
||
This tutorial demonstrates how to train a model on Intel GPUs using anomalib. | ||
Anomalib comes with XPU accelerator and strategy for PyTorch Lightning. This allows you to train your models on Intel GPUs. | ||
|
||
> [!Note] | ||
> Currently, only single GPU training is supported on Intel GPUs. | ||
> These commands were tested on Arc 750 and Arc 770. | ||
## Installing Drivers | ||
|
||
First, check if you have the correct drivers installed. If you are on Ubuntu, you can refer to the [following guide](https://dgpu-docs.intel.com/driver/client/overview.html). | ||
|
||
Another recommended tool is `xpu-smi` which can be installed from the [releases](https://github.com/intel/xpumanager) page. | ||
|
||
If everything is installed correctly, you should be able to see your card using the following command: | ||
|
||
```bash | ||
xpu-smi discovery | ||
``` | ||
|
||
## Installing PyTorch | ||
|
||
Then, ensure that you have PyTorch with XPU support installed. For more information, please refer to the [PyTorch XPU documentation](https://pytorch.org/docs/stable/notes/get_start_xpu.html) | ||
|
||
To ensure that your PyTorch installation supports XPU, you can run the following command: | ||
|
||
```bash | ||
python -c "import torch; print(torch.xpu.is_available())" | ||
``` | ||
|
||
If the command returns `True`, then your PyTorch installation supports XPU. | ||
|
||
## 🔌 API | ||
|
||
```python | ||
from anomalib.data import MVTec | ||
from anomalib.engine import Engine, SingleXPUStrategy, XPUAccelerator | ||
from anomalib.models import Stfpm | ||
|
||
engine = Engine( | ||
strategy=SingleXPUStrategy(), | ||
accelerator=XPUAccelerator(), | ||
) | ||
engine.train(Stfpm(), datamodule=MVTec()) | ||
``` | ||
|
||
## ⌨️ CLI | ||
|
||
```bash | ||
anomalib train --model Padim --data MVTec --trainer.accelerator xpu --trainer.strategy xpu_single | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
"""Accelerator for Lightning Trainer.""" | ||
|
||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from .xpu import XPUAccelerator | ||
|
||
__all__ = ["XPUAccelerator"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
"""XPU Accelerator.""" | ||
|
||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from typing import Any | ||
|
||
import torch | ||
from lightning.pytorch.accelerators import Accelerator, AcceleratorRegistry | ||
|
||
|
||
class XPUAccelerator(Accelerator): | ||
"""Support for a XPU, optimized for large-scale machine learning.""" | ||
|
||
accelerator_name = "xpu" | ||
|
||
@staticmethod | ||
def setup_device(device: torch.device) -> None: | ||
"""Sets up the specified device.""" | ||
if device.type != "xpu": | ||
msg = f"Device should be xpu, got {device} instead" | ||
raise RuntimeError(msg) | ||
|
||
torch.xpu.set_device(device) | ||
|
||
@staticmethod | ||
def parse_devices(devices: str | list | torch.device) -> list: | ||
"""Parses devices for multi-GPU training.""" | ||
if isinstance(devices, list): | ||
return devices | ||
return [devices] | ||
|
||
@staticmethod | ||
def get_parallel_devices(devices: list) -> list[torch.device]: | ||
"""Generates a list of parrallel devices.""" | ||
return [torch.device("xpu", idx) for idx in devices] | ||
|
||
@staticmethod | ||
def auto_device_count() -> int: | ||
"""Returns number of XPU devices available.""" | ||
return torch.xpu.device_count() | ||
|
||
@staticmethod | ||
def is_available() -> bool: | ||
"""Checks if XPU available.""" | ||
return hasattr(torch, "xpu") and torch.xpu.is_available() | ||
|
||
@staticmethod | ||
def get_device_stats(device: str | torch.device) -> dict[str, Any]: | ||
"""Returns XPU devices stats.""" | ||
del device # Unused | ||
return {} | ||
|
||
def teardown(self) -> None: | ||
"""Teardown the XPU accelerator. | ||
This method is empty as it needs to be overridden otherwise the base class will throw an error. | ||
""" | ||
|
||
|
||
AcceleratorRegistry.register( | ||
XPUAccelerator.accelerator_name, | ||
XPUAccelerator, | ||
description="Accelerator supports XPU devices", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
"""Strategy for Lightning Trainer.""" | ||
|
||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from .xpu_single import SingleXPUStrategy | ||
|
||
__all__ = ["SingleXPUStrategy"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
"""Lightning strategy for single XPU device.""" | ||
|
||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import lightning.pytorch as pl | ||
import torch | ||
from lightning.pytorch.strategies import SingleDeviceStrategy, StrategyRegistry | ||
from lightning.pytorch.utilities.exceptions import MisconfigurationException | ||
from lightning_fabric.plugins import CheckpointIO | ||
from lightning_fabric.plugins.precision import Precision | ||
from lightning_fabric.utilities.types import _DEVICE | ||
|
||
|
||
class SingleXPUStrategy(SingleDeviceStrategy): | ||
"""Strategy for training on single XPU device.""" | ||
|
||
strategy_name = "xpu_single" | ||
|
||
def __init__( | ||
self, | ||
device: _DEVICE = "xpu:0", | ||
accelerator: pl.accelerators.Accelerator | None = None, | ||
checkpoint_io: CheckpointIO | None = None, | ||
precision_plugin: Precision | None = None, | ||
) -> None: | ||
if not (hasattr(torch, "xpu") and torch.xpu.is_available()): | ||
msg = "`SingleXPUStrategy` requires XPU devices to run" | ||
raise MisconfigurationException(msg) | ||
|
||
super().__init__( | ||
accelerator=accelerator, | ||
device=device, | ||
checkpoint_io=checkpoint_io, | ||
precision_plugin=precision_plugin, | ||
) | ||
|
||
|
||
StrategyRegistry.register( | ||
SingleXPUStrategy.strategy_name, | ||
SingleXPUStrategy, | ||
description="Strategy that enables training on single XPU", | ||
) |