Releases: cli99/llm-analysis
Releases · cli99/llm-analysis
Bug fixes
v0.2.1
Bug fixes and MoE training analysis support
This release fixes a few bugs when calculating memory usage (e.g. activation, optimizer states), and adds support to analysis MoE training.
Bug fixes and Llama 2 inference support
This release:
- adds group query attention (GQA) support
- changes the activation memory calculation in inference to assume maximum tensor buffer
- fixes the kv cache size calculation
- adds a gpu cost analysis in the inference
- adds llama2 inference case study