Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Building an E20 Assembler in Java: A Step-by-Step Guide for ATOM Assignment 2

Learn how to write an E20 assembler in Java for ATOM Assignment 2. This tutorial covers parsing assembly instructions, converting to machine code, handling labels, and generating Verilog-style output with practical examples.

E20 assembler ATOM Assignment 2 Java assembler tutorial E20 machine code conversion assembly to machine code two-pass assembler E20 instruction set Verilog output format computer architecture project programming assignment help E20 assembly language binary instruction encoding label resolution Java file parsing student assembler project

Introduction to the E20 Assembler Project

Welcome to this comprehensive tutorial on building an E20 assembler in Java, designed to help you tackle ATOM Assignment 2. This project is a substantive programming exercise that requires you to convert E20 assembly language into 16-bit machine code. By the end of this guide, you'll understand how to parse assembly instructions, handle different instruction formats, resolve labels, and produce output in the required Verilog syntax. This assembler will eventually run on a simulated E20 processor, making it a crucial step in understanding computer architecture.

In today's tech world, assemblers are like the translators between human-readable code and machine language, similar to how AI models like ChatGPT convert natural language prompts into structured outputs. Just as a viral app processes user inputs efficiently, your assembler must accurately map each assembly instruction to its binary representation.

Understanding the E20 Instruction Set

The E20 processor uses a 16-bit instruction set with multiple formats. Unlike its predecessor E15, E20 instructions have varying field layouts. The key is to identify the opcode (first 3 bits), which determines how the remaining bits are interpreted. Common instructions include:

  • addi (opcode 001): Add immediate – format: opcode(3) | rs(3) | rt(3) | immediate(7)
  • movi (opcode 001): Move immediate – same as addi but with rs=0
  • jeq (opcode 110): Jump if equal – format: opcode(3) | rs(3) | rt(3) | address(7)
  • j (opcode 010): Jump – format: opcode(3) | address(13)
  • halt (opcode 010): Halt – special case of j with address=4

For example, the instruction addi $1, $2, 3 translates to binary: opcode=001, rs=010 (register 2), rt=001 (register 1), immediate=0000011 (3) => 0010100010000011. This maps directly to the fields shown in the assignment.

Designing Your Assembler in Java

Your assembler will read an assembly file (.s), parse each line, and output machine code to stdout. The output format is: ram[address] = 16'bxxxxxxxxxxxxxxxx; // original instruction. The address increments by 1 for each instruction, starting at 0. Labels are resolved to addresses during a two-pass process.

Step 1: Read and Preprocess the Input

Use a BufferedReader to read the file line by line. Remove comments (anything after // or #) and trim whitespace. Skip empty lines. Store each instruction in a list for processing.

import java.io.*;
import java.util.*;

public class Assembler {
    public static void main(String[] args) throws IOException {
        String filename = args[0];
        BufferedReader br = new BufferedReader(new FileReader(filename));
        List<String> lines = new ArrayList<>();
        String line;
        while ((line = br.readLine()) != null) {
            // Remove comments and trim
            int commentIndex = line.indexOf("//");
            if (commentIndex != -1) line = line.substring(0, commentIndex);
            commentIndex = line.indexOf("#");
            if (commentIndex != -1) line = line.substring(0, commentIndex);
            line = line.trim();
            if (!line.isEmpty()) lines.add(line);
        }
        br.close();
    }
}

Step 2: First Pass – Collect Labels

Iterate through the instructions and identify labels (lines ending with :). Store label-to-address mappings. Remove labels from the instruction list for the second pass.

Map<String, Integer> labelMap = new HashMap<>();
List<String> instructions = new ArrayList<>();
int address = 0;
for (String instr : lines) {
    if (instr.endsWith(":")) {
        String label = instr.substring(0, instr.length()-1);
        labelMap.put(label, address);
    } else {
        instructions.add(instr);
        address++;
    }
}

Step 3: Second Pass – Assemble Each Instruction

For each instruction, parse the opcode, registers, and immediates. Use a switch-case on the opcode mnemonic to determine the format and compute the 16-bit binary value. Then output in the required format.

int currentAddress = 0;
for (String instr : instructions) {
    String[] parts = instr.split("[ ,]+");
    String opcode = parts[0];
    int machineCode = 0;
    switch (opcode) {
        case "addi":
        case "movi":
            int rt = Integer.parseInt(parts[1].substring(1));
            int rs = (opcode.equals("movi")) ? 0 : Integer.parseInt(parts[2].substring(1));
            int imm = Integer.parseInt(parts[3]);
            machineCode = (0b001 << 13) | (rs << 10) | (rt << 7) | (imm & 0x7F);
            break;
        case "jeq":
            int rsJeq = Integer.parseInt(parts[1].substring(1));
            int rtJeq = Integer.parseInt(parts[2].substring(1));
            String label = parts[3];
            int target = labelMap.get(label);
            machineCode = (0b110 << 13) | (rsJeq << 10) | (rtJeq << 7) | (target & 0x7F);
            break;
        case "j":
            String labelJ = parts[1];
            int targetJ = labelMap.get(labelJ);
            machineCode = (0b010 << 13) | (targetJ & 0x1FFF);
            break;
        case "halt":
            machineCode = (0b010 << 13) | 4; // address 4
            break;
        default:
            // handle unknown opcode
    }
    // Output: ram[address] = 16'b...
    System.out.printf("ram[%d] = 16'b%016d;\n", currentAddress, 
        Integer.parseInt(Integer.toBinaryString(machineCode)));
    currentAddress++;
}

Note: The above code is simplified. You need to handle negative immediates (two's complement) and ensure the binary string is zero-padded to 16 bits. Use String.format or bit manipulation for proper output.

Handling Different Instruction Formats

The E20 has three main formats:

  1. R-type (e.g., add, sub): Not used in this assignment but may appear in extensions. Format: opcode(3) | rs(3) | rt(3) | rd(3) | shamt(4).
  2. I-type (e.g., addi, movi, jeq): opcode(3) | rs(3) | rt(3) | immediate(7).
  3. J-type (e.g., j, halt): opcode(3) | address(13).

Your switch-case must correctly identify the format based on the opcode mnemonic. Refer to the E20 manual for the complete list.

Testing and Debugging

Use the provided sample files (e.g., loop2.s) to verify your output. The expected output is given in the assignment. For example, the instruction movi $1, 10 at address 0 should produce ram[0] = 16'b0010000010001010;. Check each bit carefully. You can also write your own test files covering edge cases like negative immediates, multiple labels, and jumps.

Common Pitfalls

  • Off-by-one errors in address calculation: Remember that labels point to the instruction after them. In the first pass, increment address only for non-label lines.
  • Incorrect immediate handling: Immediates are 7-bit signed values. Use imm & 0x7F to mask to 7 bits, but for negative numbers, ensure two's complement representation.
  • Output formatting: The binary string must be exactly 16 bits with leading zeros. Use String.format("%16s", Integer.toBinaryString(machineCode)).replace(' ', '0').
  • Comment removal: Be careful with inline comments. A line like addi $1, $2, 3 // comment should be parsed correctly.

Conclusion

Building an assembler is a rewarding project that deepens your understanding of computer architecture. By following this guide, you'll be able to complete ATOM Assignment 2 with confidence. Remember to test thoroughly and refer to the E20 manual for any ambiguous instructions. Good luck!