Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Building a RISC-V Simulator: Step-by-Step Tutorial for CDA 4102/5155

Learn how to build a RISC-V simulator and disassembler from scratch in Python. This tutorial covers instruction formats, opcodes, and step-by-step simulation for the CDA 4102/5155 project.

RISC-V simulator CDA 4102 CDA 5155 RISC-V disassembler RISC-V project instruction set simulator Python RISC-V RISC-V tutorial computer architecture project RISC-V instruction formats build a CPU simulator RISC-V opcode RISC-V assembly simulation and disassembly RISC-V for AI

Introduction to RISC-V Simulator Project

In this tutorial, you will learn how to build a simple RISC-V simulator and disassembler as required for Project 1 in CDA 4102 and CDA 5155 (Fall 2025). The project involves reading a RISC-V binary file (text of 0/1's) and performing two main tasks: disassembly into assembly instructions and cycle-accurate simulation with register/memory dumps. We will focus on the core concepts and implementation strategy using Python, a popular language for such tools. By the end, you'll have a solid foundation to complete your assignment.

Understanding RISC-V Instruction Formats

The RISC-V ISA defines several instruction categories. In this project, we use a simplified 5-bit opcode instead of the standard 7-bit. Instructions are grouped into four categories based on the last two bits:

  • Category-1 (last bits 00): beq, bne, blt, sw (S-type format)
  • Category-2 (last bits 01): add, sub, and, or (R-type format)
  • Category-3 (last bits 10): addi, andi, ori, sll, sra, lw (I-type format)
  • Category-4 (last bits 11): jal, break (U-type format)

Each category has a specific bit layout for opcode, registers, and immediate values. For example, Category-1 uses the S-type format: imm[11:5] | rs2 | rs1 | func3 | imm[4:0] | opcode[4:0] | 00. The opcode bits are the five bits before the last two. Refer to the project document for exact opcode mappings (e.g., beq = 00000, bne = 00001, etc.).

Setting Up Your Python Environment

We'll use Python 3 for this tutorial. Create a single source file named riscv_sim.py. Use the struct module for bit manipulation and sys for file I/O. Our simulator will read a text file containing 32-bit binary strings, one per line. The first instruction starts at address 256, and the last instruction is always break. After the break instruction, the file contains 32-bit signed integers for data memory.

import sys

def read_input(filename):
    with open(filename, 'r') as f:
        lines = f.read().strip().splitlines()
    # Convert binary strings to integers
    instructions = []
    for line in lines:
        if line.strip() == '':
            continue
        instr = int(line.strip(), 2)
        instructions.append(instr)
    return instructions

Disassembler Implementation

The disassembler decodes each 32-bit instruction into its assembly mnemonic and operands. We need to identify the category by checking the last two bits. Then extract opcode, registers, and immediate fields accordingly. For example, for Category-1 (beq): extract rs1, rs2, and immediate (combine imm[11:5] and imm[4:0]), then output beq x[rs1], x[rs2], offset. The offset is sign-extended and shifted left by 1.

def disassemble(instr, address):
    last_two = instr & 0x3
    opcode = (instr >> 2) & 0x1F
    if last_two == 0:  # Category-1
        if opcode == 0: mnemonic = 'beq'
        elif opcode == 1: mnemonic = 'bne'
        elif opcode == 2: mnemonic = 'blt'
        elif opcode == 3: mnemonic = 'sw'
        else: mnemonic = 'unknown'
        # Extract fields
        imm_11_5 = (instr >> 25) & 0x7F
        rs2 = (instr >> 20) & 0x1F
        rs1 = (instr >> 15) & 0x1F
        imm_4_0 = (instr >> 7) & 0x1F
        imm = (imm_11_5 << 5) | imm_4_0
        # Sign extend 12-bit immediate
        if imm & 0x800:
            imm |= 0xF000
        offset = imm << 1
        target = address + offset
        return f'{mnemonic} x{rs1}, x{rs2}, {target}'
    # ... similar for other categories

Simulator Core: Register File and Memory

We maintain 32 registers (x0-x31), with x0 hardwired to 0. Data memory is a list of 32-bit words. Initialize all to 0. The program counter (PC) starts at 256. After each instruction, we update the PC and print register and memory state.

regs = [0] * 32
mem = [0] * 1024  # enough for demo
pc = 256

def simulate(instructions):
    global pc, regs, mem
    instr_index = 0
    while True:
        instr = instructions[instr_index]
        address = pc
        # decode and execute... (see full code)
        # For break, stop simulation
        if mnemonic == 'break':
            break
        # Print state after each instruction
        print(f'After instruction at {address}: PC={pc}')
        print('Registers:', regs[:8])  # first 8 for brevity
        print('Memory[0:8]:', mem[:8])
        instr_index += 1
        pc += 4

Executing Each Instruction Type

We'll implement execution for each category. For example, add (Category-2): rd = rs1 + rs2 (signed). lw (Category-3): load word from memory at address rs1 + immediate into rd. jal (Category-4): store PC+4 into rd, then jump to PC+offset (sign-extended, shifted left 1). Remember to handle signed vs unsigned as per RISC-V spec: arithmetic treats registers as signed, logical as unsigned.

if mnemonic == 'add':
    regs[rd] = (regs[rs1] + regs[rs2]) & 0xFFFFFFFF
elif mnemonic == 'sub':
    regs[rd] = (regs[rs1] - regs[rs2]) & 0xFFFFFFFF
elif mnemonic == 'lw':
    addr = (regs[rs1] + imm) & 0xFFFFFFFF
    regs[rd] = mem[addr//4]
elif mnemonic == 'jal':
    regs[rd] = pc + 4
    target = pc + (imm << 1)
    pc = target

Testing with Sample Input

Use the provided sample.txt file. The first instruction should be at address 256. After disassembling, you should see assembly output matching the expected format. Then run simulation and compare register/memory dumps. Debug by printing intermediate values. Common pitfalls: sign extension, immediate concatenation for branches, and offset calculation.

Optimizing for Performance

For larger programs, consider using dictionaries for opcode mapping and precomputing bit masks. Use bitwise operations for speed. In Python, you can also use numpy arrays for memory, but a list is fine for this project.

Trending Context: AI and RISC-V

RISC-V is gaining traction in AI accelerators due to its open-source nature. For example, startups are designing RISC-V cores for edge AI inference. Understanding instruction-level simulation is crucial for developing compilers and hardware verification tools. This project gives you hands-on experience with a real ISA used in modern chip design.

Conclusion

You now have a roadmap to build your RISC-V simulator. Focus on correct instruction decoding and execution semantics. Test with small programs first. Good luck with your CDA 4102/5155 project!