v0.0.1.post2
What's Changed
- [misc] fix setup.py by @DefTruth in #22
- [misc] update L20, 4090, A30, 3080 bench by @DefTruth in #23
- [FFPA] support L1 multi-stages 3/4 by @DefTruth in #24
- [Misc] find best tflops across multi-stages by @DefTruth in #25
- [FFPA] rename pyffpa -> ffpa_attn by @DefTruth in #26
- [README] Update README.md by @DefTruth in #27
- [FFPA] L1 support prefetch QKV g2s by @DefTruth in #28
- [Bugfix] fix d < 256 accuracy errors by @DefTruth in #29
- [Feature] support L1 QKV smem separation by @DefTruth in #30
- [Feature] Add ENABLE_FFPA_SMEM_SWIZZLE_V flag by @DefTruth in #31
- [bench] Add RTX 3080 Laptop perf plots by @DefTruth in #32
- [bench] Add more bench perf plots by @DefTruth in #34
- [misc] fix bench link typos by @DefTruth in #35
Full Changelog: v0.0.1.post1...v0.0.1.post2