DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

🤖LLM/VLM | Diffusion | CUDA | AI Infra

1.3k followers · 102 following

Statistics Department of JNU
Guangzhou, China
02:35 (UTC +08:00)
https://github.com/DefTruth
https://www.zhihu.com/people/qyjdef

Achievements

Achievements

DefTruth/README.md

Pinned Loading

lite.ai.toolkit lite.ai.toolkit Public

🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉

C++ 3.7k 706
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33.4k 5.1k
Awesome-LLM-Inference Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3.1k 211
CUDA-Learn-Notes CUDA-Learn-Notes Public

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 1.9k 202
Awesome-Diffusion-Inference Awesome-Diffusion-Inference Public

📖A curated list of Awesome Diffusion Inference Papers with codes, such as Sampling, Caching, Multi-GPUs, etc. 🎉🎉

141 8
cuffpa-py cuffpa-py Public

📚[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, ~1.5x🎉faster than SDPA EA.

Cuda 35 1