High-efficiency floating-point neural network inference operators for mobile, server, and Web
-
Updated
Jan 24, 2025 - C
High-efficiency floating-point neural network inference operators for mobile, server, and Web
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
The Tensor Algebra SuperOptimizer for Deep Learning
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Batch normalization fusion for PyTorch
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
The blog, read report and code example for AGI/LLM related knowledge.
cross-platform modular neural network inference library, small and efficient
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
[WIP] A template for getting started writing code using GGML
LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.
Your AI Catalyst: inference backend to maximize your model's inference performance
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
A constrained expectation-maximization algorithm for feasible graph inference.
Modified inference engine for quantized convolution using product quantization
Batch Partitioning for Multi-PE Inference with TVM (2020)
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."