Skip to content

Commit

Permalink
Fix title position
Browse files Browse the repository at this point in the history
  • Loading branch information
saadrahim authored Dec 10, 2024
1 parent db84137 commit aef40a0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions blogs/ecosystems-and-partners/zyphra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ myst:

# Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

## Harnessing the MI300 Hardware Specs

This blog is contributed by [Zyphra](https://www.zyphra.com/): a Palo Alto-based AI research lab and AMD Instinct Partner.

Zyphra is designing MaiaOS, a multimodal agent system that combines next-gen neural network architectures (SSM hybrids), long-term memory, and reinforcement learning.

In this blog we motivate our vision of training transformers and hybrid models at a lower cost using AMD technology. We explain how Zyphra harnessed the hardware advantages of the MI300x hardware for training both dense transformers and Zyphra's hybrid models. Specifically, the model blocks of interest are Mamba2 and Flash Attention v2. We conclude the blog by sharing benchmarks results showing the speedups we achieved on the MI300X using ROCm, compared to the competition.

## Harnessing the MI300 Hardware Specs

On paper, the AMD Instinct MI300X GPU accelerators contain some of the best hardware specifications on the market. The key hardware specs where the MI300X surpasses its main competitor, the NVIDIA H100 GPU, are High Bandwidth Memory (HBM) capacity and bandwidth.

The MI300X also has more compute hardware at its disposal, with a significantly greater number of streaming multiprocessors (SMs) than the H100. While this leads to incredibly high theoretical BFLOAT16 throughput, there are some caveats in practice that we discuss below.
Expand Down

0 comments on commit aef40a0

Please sign in to comment.