- Instructor: Jeehoon Kang
- Time: Tue/Thu 9:00-10:30
- Place: Rm. 1101, Bldg. E3-1. YOUR PHYSICAL ATTENDANCE IS REQUIRED unless announced otherwise.
- Websites: https://github.com/kaist-cp/cs420, https://gg.kaist.ac.kr/course/15/
- Announcements: in issue
tracker
- We assume you read each announcement within 24 hours.
- We strongly recommend you to watch the repository.
- TA: Jaewoo Kim (Head TA), Janggun Lee
- Office Hour: Fri 9:00am-10:00am, Room 4432, E3-1. It is not required, but if you want to come, do so by 9:15am. See below for office hour policy.
Compilers bridge the gap between human and machine. Human wants to easily express complex idea. On the other hand, machine understands only a few words (instructions) to be efficiently implemented in silicon. Compilers transform programs from a form suitable for human to easily express complex idea, to a form suitable for machine to efficiently execute. Since the gap between human and machine is fundamentally wide, compilers have been constructed and widely used since the beginning of the history of computing. Even, the first practical compiler predates the first practical operating systems (according to Wikipedia)!
In response to industry shifts, new compilers should be written and written again. First, human wants to express more and more complex idea, especially in the era of artificial intelligence and big data. Second, machine changes in response to physics (e.g. the ending of Dennard scaling and Moore's law) and industrial needs (e.g. Internet of Things and distributed systems). New compilers should be constructed to close the new gap between changing human and changing machine. For this reason, industrial needs for (and salary of) compiler engineers have been constantly high.
In this class, we will learn how to construct a compiler by actually building one. You are going to benefit from the provided skeleton code of a clean slate educational compiler--dubbed KECC: KAIST Educational C Compiler (think: KENS for networking or Pintos, xv6 for operating systems). We are going to discuss parsing only briefly, because the topic is assumed to be dealt with in CS322: Formal Languages and Automata. (You don't need to know parsing to take this course, though.) We will focus on translation from human-friendly form to machine-friendly form, and compiler optimizations. Specifically, we will discuss (1) how to transform a C program to an SSA-based intermediate representation (IR); (2) how to perform register promotion, static single assignment, global value numbering, and register allocation optimizations on the IR; and (3) how to transform an IR program to a RISC-V assembly program. KECC will provide a significant amount of skeleton code so that you can focus on the topic of this course.
We will also briefly discuss the recent trends of compiler construction. I see two crucial recent trends: scripting languages and parallelism. (1) Scripting languages like JavaScript and Python, unlike C, should be compiled (or interpreted) at run-time, and therefore, there is no clear distinction of compile- and run-time. It is a challenge in that compile time should also be optimized, but it is also an opportunity in that compile may gather and benefit from run-time information. (2) It is crucial to exploit the massive parallelism of modern applications like deep learning and high-performance computing (HPC), because they require so huge computation. Due to the complexity of workloads, their parallelism should be automatically discovered and exploited by compilers, which is a big challenge.
We will also briefly study the theory of compiler. We will focus on the correctness of compiler. In general, in what sense a compiler is correct, and how to prove it? Specifically, how to prove the correctness of KECC's transformations and optimizations? As it will turn out, this compiler correctness theory will greatly help you efficiently build your own compiler.
-
It is strongly recommended that students already took courses on:
- Mathematics (MAS101): proposition statement and proof
- Data structures (CS206): linked list, stack, queue
- Systems programming (CS230) or Computer Organization (CS311): memory layout, stack and heap, assembly language
- Programming languages (CS320): lambda calculus, interpreter
Without a proper understanding of these topics, you will likely struggle in this course.
-
Other recommendations which would help you in this course:
Make sure you're capable of using the following development tools:
-
Git: for downloading KECC and version-controlling your development. If you're not familiar with Git, walk through this tutorial.
-
IMPORTANT: you should not expose your work to others. In particular, you should not fork the upstream and push there. Please the following steps:
-
Directly clone the upstream without forking it.
$ git clone --origin upstream [email protected]:kaist-cp/kecc-public.git $ cd kecc-public $ git remote -v upstream [email protected]:kaist-cp/kecc-public.git (fetch) upstream [email protected]:kaist-cp/kecc-public.git (push)
-
To get updates from the upstream, fetch and merge
upstream/main
.$ git fetch upstream $ git merge upstream/main
-
-
If you want to manage your development in a Git server, please create your own private repository.
-
You may upgrade your GitHub account to "PRO", which is free of charge. Refer to the documentation.
-
Set up your repository as a remote.
$ git remote add origin [email protected]:<github-id>/kecc-public.git $ git remote -v origin [email protected]:<github-id>/kecc-public.git (fetch) origin [email protected]:<github-id>/kecc-public.git (push) upstream [email protected]:kaist-cp/kecc-public.git (fetch) upstream [email protected]:kaist-cp/kecc-public.git (push)
-
Push to your repository.
$ git push -u origin main
-
-
-
Rust: as the language of homework implementation. We chose Rust because its ownership type system greatly simplifies the development of large-scale system software.
We recommend you to read this page that describes how to study Rust.
-
Visual Studio Code (optional): for developing your homework. If you prefer other editors, you're good to go.
-
Single Sign On (SSO): Use the following SSO credentials to access gg and the development server:
- id: KAIST student id (8-digit number)
- email: KAIST email address (@kaist.ac.kr)
- password: Reset it here: https://auth.fearless.systems/if/flow/default-recovery-flow/
- Log in to gg using the "kaist-cp-class" option, and to the development server using the "OpenID Connect" option.
-
-
IMPORTANT: Do not attempt to hack or overload the server. Please use it responsibly.
-
Create and connect to a workspace to use the terminal or VSCode (after installation).
-
We recommend using VSCode with the "Rust Analyzer" and "CodeLLDB" plugins.
-
Install necessary dependencies for KECC
sudo apt update sudo apt install \ git man-db locales \ vim neovim emacs \ zsh bash-completion tmux \ build-essential gcc clang make cmake python3 csmith libcsmith-dev creduce \ gcc-riscv64-linux-gnu g++-riscv64-linux-gnu qemu-user-static \ graphviz curl \ zip python3-pip pip install tqdm
-
NOTE: If permission denied error occurs when trying to install
CodeLLDB Extension
into the remote server, please follow the steps:- Download this file at the remote server.
- Follow the instructions to install it.
-
NOTE: If you cannot connect to the remote server via VSCode with
fail to create hard link
error message, please follow the steps:- Close VSCode window and try to connect to the remote server via terminal(or cmd). If you encounter
Connection timed out
error message, try again after a few minutes. - Delete all the files in
~/.vscode-server/bin/
.
- Close VSCode window and try to connect to the remote server via terminal(or cmd). If you encounter
-
IMPORTANT: PAY CLOSE ATTENTION. VERY SERIOUS.
-
Sign the KAIST CS Honor Code for this semester. Failure to do so may lead to expulsion from the course.
-
We will employ sophisticated tools to detect code plagiarism.
- Search for "code plagiarism detector" on Google Images to understand how these tools can identify advanced forms of plagiarism. Do not attempt plagiarism in any form.
-
You will implement translations and optimizations inside KECC.
-
All homework submissions will be automatically graded online so that you can immediately see your score.
-
Since compiler construction requires nontrivial undertaking, you're encouraged to ask questions on the homework in the issue tracker at the early stage of the semester.
-
You are permitted to use ChatGPT or other LLMs.
-
Dates & Times: April 15th (Tue), June 10th (Tue), 9:00-11:00
-
The exams will evaluate your understanding of compiler theory.
-
Your physical attendance is required.
-
A quiz must be completed on the Course Management website for each session (if any). Quizzes should be completed by the end of the day.
-
Failing to complete a significant number of quizzes will result in an automatic F.
- Make sure you can log into the lab submission website.
- Use your
kaist-cp-class
account for login. - Your ID is your
@kaist.ac.kr
email address. - Reset your password here: https://auth.fearless.systems/if/flow/default-recovery-flow/
- Contact the instructor if login issues arise.
- Use your
-
Course-related announcements and information will be posted on the course website and the GitHub issue tracker. It is expected that you read all announcements within 24 hours of their posting. Watching the repository is highly recommended for automatic email notifications of new announcements.
-
Questions about course materials and assignments should be posted in the course repository's issue tracker.
- Avoid emailing the instructor or TAs regarding course materials and assignments.
- Search your question using Google and Stack Overflow before posting.
- Describe your question in detail, including:
- Environment (OS, gcc, g++ version, and other relevant program information).
- Used commands and their results, with logs formatted in code. See this guide.
- Any changes made to directories or files. For solution files, describe the modified code sections.
- Your Google search results, including search terms and learned information.
- Use a clear and descriptive title for your issue.
- The requirement to ask questions online first is twofold: It ensures clarity in your query and allows everyone to benefit from shared questions and answers.
-
Email inquiries should be reserved for confidential or personal matters. Questions not adhering to this guideline (e.g., course material queries via email) will not be addressed.
-
Office hours will not cover new questions. Check the issue tracker for similar questions before attending. If your question is not listed, post it as a new issue for discussion. Office hour discussions will focus on unresolved issues.
-
Emails to the instructor or head TA should start with "cs420:" in the subject line, followed by a brief description. Include your name and student number in the email. Emails lacking this information (e.g., those without a student number) will not receive a response.
-
If you join the session remotely from Zoom (https://kaist.zoom.us/my/jeehoon.kang), your Zoom name should be
<your student number> <your name>
(e.g.,20071163 강지훈
). Change your name by referring to this. -
The course is conducted in English. However, you may ask questions in Korean, which will be translated into English.