This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Science of HSE University and Yandex School of Data Analysis.
This branch corresponds to the ongoing 2025 course. If you want to see full materials of past years, see the "Past versions" section.
- Week 1: Introduction
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
- Week 2: Experiment tracking, model and data versioning, testing DL code in Python
- Week 3: Training optimizations, profiling DL code
- Week 4: Data-parallel training and All-Reduce
- Week 5: __Sharded data-parallel training, distributed training optimizations
- Week 6: Training large models
- Week 7: Python web application deployment
- Week 8: LLM inference optimizations and software
- Week 9: Efficient model inference
- Week 10: Guest lecture
There will be several home assignments (spread over multiple weeks) on the following topics:
- Training pipelines and code profiling
- Distributed and memory-efficient training
- Deploying and optimizing models for production
The final grade is a weighted sum of per-assignment grades. Please refer to the course page of your institution for details.