Key features include:
• Automatic Device Redirection: Redirects torch.device based on hardware (CUDA, MPS, CPU).
• Mocked CUDA Functions: Run CUDA-based code on MPS.
• Unified Memory Handling: Smooth transition between memory management systems.
• Performance Boost: Includes a recipe to recompile NumPy with Accelerate on macOS, yielding 8x performance gains.
Also in the README there is a recipe for improving your NumPy performance.
Feedback always welcome.