Programming lesson
Building a RISC-V Code Generator for ChocoPy: A Step-by-Step Guide
Learn how to implement a code generator for the ChocoPy compiler in CS164 assignment 3, from parsing to assembly generation, with practical tips and trend-inspired analogies.
Introduction to Code Generation in Compilers
Code generation is the final phase of a compiler, where the intermediate representation (IR) is translated into target machine code. In the context of the CS164 assignment 3 (pa3), you are tasked with implementing a code generator for ChocoPy, a subset of Python that compiles to RISC-V assembly. This assignment bridges the gap between high-level programming and low-level hardware, a skill essential for understanding how modern apps, AI frameworks, and even game engines execute code efficiently.
Think of this like a video game where the player's high-level actions (like jumping or shooting) are translated into precise button presses and memory operations. Similarly, your code generator takes typed AST JSON and produces RISC-V instructions that the Venus simulator can execute. With the rise of RISC-V in hardware startups and embedded systems (e.g., AI accelerators), understanding this process is more relevant than ever.
Understanding the Assignment Pipeline
Before diving into code, it's crucial to understand the pipeline. The assignment provides a reference parser and reference analysis that produce a typed AST JSON. Your job is to implement the third step: generating RISC-V assembly from that JSON. The starter code passes only one test; your goal is to pass all provided tests by correctly handling variables, expressions, control flow, functions, and object-oriented features.
Step-by-Step Workflow
- Build the compiler: Run
mvn clean packageto compile the Java project. - Run the code generator on a single file: Use the three-step command chain: parse to AST JSON, analyze to typed AST JSON, then generate assembly. For sample tests, you can skip the first two steps because typed AST JSONs are already provided in
src/tests/data/pa3/sample/. - Execute the assembly: Use the Venus simulator with
--runto verify output. - Test all at once: Use the chained command
java -cp ... chocopy.ChocoPy --pass=rrs --run --dir ... --testto run all sample programs.
If you're on Windows, remember to replace colons with semicolons in classpath arguments. This small detail can save debugging time.
Core Concepts in Code Generation
Your code generator must traverse the typed AST and emit RISC-V instructions. Key concepts include:
- Register allocation: Mapping ChocoPy variables to RISC-V registers (e.g.,
t0,t1,a0). The reference implementation uses a simple stack-based approach. - Expression evaluation: Handling arithmetic, boolean, and comparison operations by emitting
add,sub,mul,div,and,or,slt, etc. - Control flow: Implementing
if,while, andforloops using branch instructions (beq,bne,blt) and labels. - Function calls: Managing the call stack, passing arguments via registers
a0-a7, and handling return values. - Object-oriented features: ChocoPy supports classes, methods, and inheritance. You'll need to generate code for method dispatch, attribute access, and object creation.
To make this concrete, consider a social media app like Instagram. Each user's profile is an object with attributes (username, followers) and methods (post photo, like). Your code generator must translate high-level Python-style code into RISC-V instructions that manipulate memory and registers to simulate these objects.
Common Pitfalls and How to Avoid Them
Based on student experiences, here are frequent issues:
- Incorrect classpath: Double-check colons vs semicolons. Use absolute paths if relative paths cause trouble.
- Misunderstanding the typed AST: The JSON structure includes nodes for
Assign,Call,If, etc. Study the provided sample files to understand the format. - Register spilling: When you run out of registers, you must spill values to the stack. The reference implementation uses a simple frame pointer (
fp) to manage local variables. - Function prologue/epilogue: Every function must save and restore callee-saved registers (
s0-s11) and set up the stack frame. - Object layout: Objects are stored as heap-allocated structs. The first word typically points to a vtable for dynamic dispatch. Ensure you align with the ChocoPy Implementation Guide.
Testing incrementally is key. Start with simple constant expressions, then variables, then arithmetic, then control flow, functions, and finally classes. Use the Venus simulator's --print option to see the generated assembly and debug.
Trend-Inspired Example: Simulating a Gaming Leaderboard
Imagine you're building a code generator for a gaming leaderboard in ChocoPy. The high-level code might look like:
class Leaderboard:
def __init__(self):
self.scores = []
def add_score(self, player: str, score: int):
self.scores.append((player, score))
def top_three(self) -> list:
# sort and return top three
...Your code generator must translate this into RISC-V. For example, creating a Leaderboard object involves allocating memory on the heap (using malloc or sbrk), initializing the vtable pointer, and storing the scores list (which itself is a dynamic array). Method calls like lb.add_score("Alice", 100) require setting up arguments in a0 (self), a1 (pointer to string), and a2 (integer), then jumping to the method's code via the vtable.
This example mirrors real-world game development where leaderboards are updated in real-time, and efficient code generation ensures smooth performance even with thousands of players.
Tips for Efficient Development
- Use the reference implementation: Run with
--pass=rrrto see correct assembly output. Compare with your own to spot differences. - Leverage the Venus simulator: It can step through instructions, print register values, and breakpoint. Use
--helpfor options. - Write small test programs: Create minimal ChocoPy files (e.g.,
print(1+2)) and test each feature in isolation. - Understand the starter code: The provided
CodeGenclass likely has methods likevisitAssign,visitCall, etc. Extend them systematically.
Remember, the assignment is not about writing the most optimized code, but about correctly implementing the specification. Focus on correctness first, then consider efficiency.
Conclusion
Implementing a code generator for ChocoPy is a challenging but rewarding experience that deepens your understanding of compilers, computer architecture, and low-level programming. By following this guide and leveraging the provided resources, you'll be able to pass all tests and gain skills applicable to AI hardware, embedded systems, and game development. Good luck with your CS164 assignment!