Teaching
System for Artificial Intelligence
人工智能系统
Course Description
This course focuses on the system design and implementation that underpin modern artificial intelligence applications. Students will learn the design principles behind state-of-the-art machine learning systems and systematic performance optimization. Core topics cover the full-stack key technologies, from modern AI computing hardware architectures and programming paradigms to deep learning frameworks, compilers, and finally to clustered distributed training and inference. Through theoretical study and a series of hands-on projects, students will master the systematic methodology for transforming AI models into production-grade services.
Learning Objectives
- Understand Core Principles: Gain a deep understanding of the system design principles that support modern AI applications (especially Large Language Models) and systematically master the key full-stack technologies, from the underlying hardware architecture and programming paradigms to the upper-level deep learning frameworks and compilers.
- Master Optimization Techniques: Learn and master key performance optimization methods for machine learning systems, including how to effectively scale computation, reduce memory footprint, and perform efficient task offloading and scheduling on heterogeneous computing resources (such as CPUs, GPUs, and NPUs).
- Develop Practical Skills: Through theoretical study and a series of hands-on projects, master the systematic methodology for transforming AI models into stable and efficient production-grade services, and possess the ability to design, implement, and deploy modern machine learning systems.
- Connect with Cutting-Edge Fields: Through the study of frontier case studies such as the training and servicing of Large Language Models (LLMs), students will become familiar with the latest technologies and challenges in the industry, laying a solid foundation of knowledge and skills for future application and research in the field of machine learning systems.
Syllabus
I
Foundations
(3 lectures)
II
Hardware Acceleration & Programming
(4 lectures)
III
Machine Learning Compilation
(1 lecture)
-
07 Machine Learning Compilation
IV
LLM Fundamentals & Distributed Computing
(3 lectures)
V
LLM Parallelization & Training Techniques
(3 lectures)
VI
LLM Serving Techniques
(2 lectures)
VII
Post-Training & Project
(1 lecture)
-
16 Post-Training: Reinforcement Learning for LLMs
Assessment
10%
Class Participation
- 1st unexcused absence: Warning
- 2nd unexcused absence: Grade halved
- 3rd unexcused absence: Grade becomes 0
45%
Assignments
Three programming assignments, each worth 15%.
45%
Course Project
Groups of 2-3 students:
- Proposal: 5%
- Technical Report: 20%
- Presentation: 20%