You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TT-SMI shows devices zero-indexed regardless of the device id under /dev/tenstorrent. This is slightly confusing since users wouldn't know the pci index of a card unless they run ls /dev/tenstorrent on a single card container. I don't there's any way to figure out that mapping for multi-card devices.
I think we should add a PCI ID / Device ID field to TT-SMI separate from the chip index. The reset command already accepts the PCI index, so this would just make it easier for the users to figure out which card they want to reset.
Screenshots
(python_env) user@asrinivasan-test-822bbc87-deployment-64b7856dd4-r2gd2:~$ tt-smi -r 0
thread '<unnamed>' panicked at crates/pyluwen/src/lib.rs:521:70:
called `Result::unwrap()` on an `Err` value: DeviceOpenFailed { id: 0, source: Os { code: 2, kind: NotFound, message: "No such file or directory" } }
stack backtrace:
0: 0x7fa57b289f5b - std::backtrace_rs::backtrace::libunwind::trace::h3926e05c1d1f3b6d
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
1: 0x7fa57b289f5b - std::backtrace_rs::backtrace::trace_unsynchronized::h9f5691494ac25ae6
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x7fa57b289f5b - std::sys_common::backtrace::_print_fmt::h7e6bb7b81bf214f4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:67:5
3: 0x7fa57b289f5b - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hcf688c88e28c91b4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:44:22
4: 0x7fa57b2bdab0 - core::fmt::rt::Argument::fmt::h59a542682908b618
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/fmt/rt.rs:142:9
5: 0x7fa57b2bdab0 - core::fmt::write::hce91e70849a27dee
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/fmt/mod.rs:1120:17
6: 0x7fa57b2802bd - std::io::Write::write_fmt::h0bba58d3b1b495e9
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/io/mod.rs:1762:15
7: 0x7fa57b289d44 - std::sys_common::backtrace::_print::hf3a4f110a22f16df
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:47:5
8: 0x7fa57b289d44 - std::sys_common::backtrace::print::h0450d1fd5fc83f73
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:34:9
9: 0x7fa57b2a6a6a - std::panicking::default_hook::{{closure}}::hee7ec73fab21a529
10: 0x7fa57b2a670d - std::panicking::default_hook::he65be6b11b67d1e4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:292:9
11: 0x7fa57b2a6da8 - std::panicking::rust_panic_with_hook::h9e4f07a5a69c9caf
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:779:13
12: 0x7fa57b28a33e - std::panicking::begin_panic_handler::{{closure}}::h69a9732dd2e7007d
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:657:13
13: 0x7fa57b28a176 - std::sys_common::backtrace::__rust_end_short_backtrace::hf159dc40d4738bc4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:170:18
14: 0x7fa57b2a6ad2 - rust_begin_unwind
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:645:5
15: 0x7fa57b2009f5 - core::panicking::panic_fmt::hf38ef33e65607e17
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/panicking.rs:72:14
16: 0x7fa57b201103 - core::result::unwrap_failed::h93afb55b612add5a
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/result.rs:1653:5
17: 0x7fa57b20821a - pyluwen::PciChip::new::h6568fd9db638c898
18: 0x7fa57b226c7f - pyluwen::_::_::__INVENTORY::trampoline::h05d38c449707a19e
19: 0x5d5f53 - _PyObject_MakeTpCall
20: 0x54d44a - _PyEval_EvalFrameDefault
21: 0x54552a - _PyEval_EvalCodeWithName
22: 0x5d5a23 - _PyFunction_Vectorcall
23: 0x5483b6 - _PyEval_EvalFrameDefault
24: 0x5d5846 - _PyFunction_Vectorcall
25: 0x547265 - _PyEval_EvalFrameDefault
26: 0x54552a - _PyEval_EvalCodeWithName
27: 0x684327 - PyEval_EvalCode
28: 0x673a41 - <unknown>
29: 0x673abb - <unknown>
30: 0x673b61 - <unknown>
31: 0x6747e7 - PyRun_SimpleFileExFlags
32: 0x6b4072 - Py_RunMain
33: 0x6b43fd - Py_BytesMain
34: 0x7fa57d81e083 - __libc_start_main
35: 0x5da67e - _start
36: 0x0 - <unknown>
Traceback (most recent call last):
File "/usr/local/bin/tt-smi", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/tt_smi/tt_smi.py", line 733, in main
pci_board_reset(args.reset, reinit=True)
File "/usr/local/lib/python3.8/dist-packages/tt_smi/tt_smi_backend.py", line 523, in pci_board_reset
chip = PciChip(pci_interface=pci_idx)
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: DeviceOpenFailed { id: 0, source: Os { code: 2, kind: NotFound, message: "No such file or directory" } }
Also if the wrong device ID is given we should fail more gracefully.
The text was updated successfully, but these errors were encountered:
Summary
TT-SMI shows devices zero-indexed regardless of the device id under
/dev/tenstorrent
. This is slightly confusing since users wouldn't know the pci index of a card unless they runls /dev/tenstorrent
on a single card container. I don't there's any way to figure out that mapping for multi-card devices.I think we should add a PCI ID / Device ID field to TT-SMI separate from the chip index. The reset command already accepts the PCI index, so this would just make it easier for the users to figure out which card they want to reset.
Screenshots
Also if the wrong device ID is given we should fail more gracefully.
The text was updated successfully, but these errors were encountered: