Programming lesson
Building a RISC-V Simulator: Step-by-Step Tutorial for CDA 4102/5155
Learn how to build a RISC-V simulator and disassembler from scratch in Python. This tutorial covers instruction formats, opcodes, and step-by-step simulation for the CDA 4102/5155 project.
Introduction to RISC-V Simulator Project
In this tutorial, you will learn how to build a simple RISC-V simulator and disassembler as required for Project 1 in CDA 4102 and CDA 5155 (Fall 2025). The project involves reading a RISC-V binary file (text of 0/1's) and performing two main tasks: disassembly into assembly instructions and cycle-accurate simulation with register/memory dumps. We will focus on the core concepts and implementation strategy using Python, a popular language for such tools. By the end, you'll have a solid foundation to complete your assignment.
Understanding RISC-V Instruction Formats
The RISC-V ISA defines several instruction categories. In this project, we use a simplified 5-bit opcode instead of the standard 7-bit. Instructions are grouped into four categories based on the last two bits:
- Category-1 (last bits
00): beq, bne, blt, sw (S-type format) - Category-2 (last bits
01): add, sub, and, or (R-type format) - Category-3 (last bits
10): addi, andi, ori, sll, sra, lw (I-type format) - Category-4 (last bits
11): jal, break (U-type format)
Each category has a specific bit layout for opcode, registers, and immediate values. For example, Category-1 uses the S-type format: imm[11:5] | rs2 | rs1 | func3 | imm[4:0] | opcode[4:0] | 00. The opcode bits are the five bits before the last two. Refer to the project document for exact opcode mappings (e.g., beq = 00000, bne = 00001, etc.).
Setting Up Your Python Environment
We'll use Python 3 for this tutorial. Create a single source file named riscv_sim.py. Use the struct module for bit manipulation and sys for file I/O. Our simulator will read a text file containing 32-bit binary strings, one per line. The first instruction starts at address 256, and the last instruction is always break. After the break instruction, the file contains 32-bit signed integers for data memory.
import sys
def read_input(filename):
with open(filename, 'r') as f:
lines = f.read().strip().splitlines()
# Convert binary strings to integers
instructions = []
for line in lines:
if line.strip() == '':
continue
instr = int(line.strip(), 2)
instructions.append(instr)
return instructionsDisassembler Implementation
The disassembler decodes each 32-bit instruction into its assembly mnemonic and operands. We need to identify the category by checking the last two bits. Then extract opcode, registers, and immediate fields accordingly. For example, for Category-1 (beq): extract rs1, rs2, and immediate (combine imm[11:5] and imm[4:0]), then output beq x[rs1], x[rs2], offset. The offset is sign-extended and shifted left by 1.
def disassemble(instr, address):
last_two = instr & 0x3
opcode = (instr >> 2) & 0x1F
if last_two == 0: # Category-1
if opcode == 0: mnemonic = 'beq'
elif opcode == 1: mnemonic = 'bne'
elif opcode == 2: mnemonic = 'blt'
elif opcode == 3: mnemonic = 'sw'
else: mnemonic = 'unknown'
# Extract fields
imm_11_5 = (instr >> 25) & 0x7F
rs2 = (instr >> 20) & 0x1F
rs1 = (instr >> 15) & 0x1F
imm_4_0 = (instr >> 7) & 0x1F
imm = (imm_11_5 << 5) | imm_4_0
# Sign extend 12-bit immediate
if imm & 0x800:
imm |= 0xF000
offset = imm << 1
target = address + offset
return f'{mnemonic} x{rs1}, x{rs2}, {target}'
# ... similar for other categoriesSimulator Core: Register File and Memory
We maintain 32 registers (x0-x31), with x0 hardwired to 0. Data memory is a list of 32-bit words. Initialize all to 0. The program counter (PC) starts at 256. After each instruction, we update the PC and print register and memory state.
regs = [0] * 32
mem = [0] * 1024 # enough for demo
pc = 256
def simulate(instructions):
global pc, regs, mem
instr_index = 0
while True:
instr = instructions[instr_index]
address = pc
# decode and execute... (see full code)
# For break, stop simulation
if mnemonic == 'break':
break
# Print state after each instruction
print(f'After instruction at {address}: PC={pc}')
print('Registers:', regs[:8]) # first 8 for brevity
print('Memory[0:8]:', mem[:8])
instr_index += 1
pc += 4Executing Each Instruction Type
We'll implement execution for each category. For example, add (Category-2): rd = rs1 + rs2 (signed). lw (Category-3): load word from memory at address rs1 + immediate into rd. jal (Category-4): store PC+4 into rd, then jump to PC+offset (sign-extended, shifted left 1). Remember to handle signed vs unsigned as per RISC-V spec: arithmetic treats registers as signed, logical as unsigned.
if mnemonic == 'add':
regs[rd] = (regs[rs1] + regs[rs2]) & 0xFFFFFFFF
elif mnemonic == 'sub':
regs[rd] = (regs[rs1] - regs[rs2]) & 0xFFFFFFFF
elif mnemonic == 'lw':
addr = (regs[rs1] + imm) & 0xFFFFFFFF
regs[rd] = mem[addr//4]
elif mnemonic == 'jal':
regs[rd] = pc + 4
target = pc + (imm << 1)
pc = targetTesting with Sample Input
Use the provided sample.txt file. The first instruction should be at address 256. After disassembling, you should see assembly output matching the expected format. Then run simulation and compare register/memory dumps. Debug by printing intermediate values. Common pitfalls: sign extension, immediate concatenation for branches, and offset calculation.
Optimizing for Performance
For larger programs, consider using dictionaries for opcode mapping and precomputing bit masks. Use bitwise operations for speed. In Python, you can also use numpy arrays for memory, but a list is fine for this project.
Trending Context: AI and RISC-V
RISC-V is gaining traction in AI accelerators due to its open-source nature. For example, startups are designing RISC-V cores for edge AI inference. Understanding instruction-level simulation is crucial for developing compilers and hardware verification tools. This project gives you hands-on experience with a real ISA used in modern chip design.
Conclusion
You now have a roadmap to build your RISC-V simulator. Focus on correct instruction decoding and execution semantics. Test with small programs first. Good luck with your CDA 4102/5155 project!