Enabling non-hardcoded run of our test infrastructure #145

ajakovljevicTT · 2025-01-03T11:14:27Z

As per issue #9, our test infra had a problem when detecting two TT devices, as is the case on the two-chip N300, which was workarounded by hardcoding which devices to use. This PR fixes that, instead just setting that the number of devices that the backend uses is capped to 1. Additionally, there was a bug in our old infra which ran the tests on cpu instead of a TT device, so this PR also fixes that.

Fixes #9.

kmitrovicTT · 2025-01-10T10:14:29Z

@ajakovljevicTT please rebase and ping me again to take a look. That bug should be gone now that old infra is gone.

inc/common/pjrt_implementation/device_description.h

src/common/module_builder.h

src/common/pjrt_implementation/loaded_executable_instance.cc

kmitrovicTT · 2025-01-10T13:22:05Z

src/common/pjrt_implementation/loaded_executable_instance.cc

        LoadedExecutableInstance::Unwrap(args->executable)
            ->addressable_devices();
+    int num_addressable_devices =
+        LoadedExecutableInstance::Unwrap(args->executable)
+            ->image_->get_num_addresible_devices();


Typo: addressable.

Maybe add assert num_addressable_devices == addressable_devices.size().

src/common/pjrt_implementation/loaded_executable_instance.cc

tests/infra/device_connector.py

kmitrovicTT · 2025-01-10T13:33:32Z

tests/infra/device_runner.py

        device = device_connector.connect_device(device_type)

        with jax.default_device(device):
            return device_workload.execute()

    @staticmethod
-    def _put_on_device(device_type: DeviceType, workload: Workload) -> Workload:
+    def _put_on_device(
+        device_type: DeviceType, workload: Workload, num_device: int = 0


Maybe rearrange a bit to keep device_type and device_num next to each other.

kmitrovicTT · 2025-01-10T13:33:41Z

tests/infra/device_runner.py

@@ -64,18 +64,22 @@ def put_tensors_on_gpu(*tensors: Tensor) -> Sequence[Tensor]:
        raise NotImplementedError("Support for GPUs not implemented")

    @staticmethod
-    def _run_on_device(device_type: DeviceType, workload: Workload) -> Tensor:
+    def _run_on_device(
+        device_type: DeviceType, workload: Workload, num_device: int = 0


tests/infra/device_runner.py

ajakovljevicTT force-pushed the ajakovljevic/solving_the_twochip_issue branch 3 times, most recently from 2fae952 to 7f648a4 Compare January 3, 2025 11:19

ajakovljevicTT marked this pull request as ready for review January 3, 2025 11:26

ajakovljevicTT requested review from kmitrovicTT and mrakitaTT as code owners January 3, 2025 11:26

ajakovljevicTT requested a review from AleksKnezevic January 3, 2025 11:26

ajakovljevicTT force-pushed the ajakovljevic/solving_the_twochip_issue branch 2 times, most recently from 895a20e to e004717 Compare January 10, 2025 10:08

Fixed hardcoding od x2 chips

2e5c6d5

ajakovljevicTT force-pushed the ajakovljevic/solving_the_twochip_issue branch from e004717 to 2e5c6d5 Compare January 10, 2025 10:47

Changed test api

418f9d4

ajakovljevicTT force-pushed the ajakovljevic/solving_the_twochip_issue branch from ccd0cf2 to 418f9d4 Compare January 10, 2025 12:27

kmitrovicTT reviewed Jan 10, 2025

View reviewed changes

Addressed comments

680a3ea

ajakovljevicTT force-pushed the ajakovljevic/solving_the_twochip_issue branch from 112e32a to 680a3ea Compare January 10, 2025 15:35

mrakitaTT mentioned this pull request Jan 13, 2025

Tests running on CPU #152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling non-hardcoded run of our test infrastructure #145

Enabling non-hardcoded run of our test infrastructure #145

ajakovljevicTT commented Jan 3, 2025 •

edited

Loading

kmitrovicTT commented Jan 10, 2025

kmitrovicTT Jan 10, 2025

kmitrovicTT Jan 10, 2025

kmitrovicTT Jan 10, 2025

kmitrovicTT Jan 10, 2025

Enabling non-hardcoded run of our test infrastructure #145

Are you sure you want to change the base?

Enabling non-hardcoded run of our test infrastructure #145

Conversation

ajakovljevicTT commented Jan 3, 2025 • edited Loading

kmitrovicTT commented Jan 10, 2025

kmitrovicTT Jan 10, 2025

Choose a reason for hiding this comment

kmitrovicTT Jan 10, 2025

Choose a reason for hiding this comment

kmitrovicTT Jan 10, 2025

Choose a reason for hiding this comment

kmitrovicTT Jan 10, 2025

Choose a reason for hiding this comment

ajakovljevicTT commented Jan 3, 2025 •

edited

Loading