CS188 Project 5 Tutorial: Binary Perceptron to GPT

Introduction to CS188 Project 5: Machine Learning Foundations

In this tutorial, we dive into CS188 Project 5, where you'll implement core machine learning models from scratch. From a simple binary perceptron to a character-level GPT, you'll gain hands-on experience with gradient-based learning, neural networks, and attention mechanisms. Whether you're preparing for AI job interviews or building your own AI app, these skills are essential. Let's break down each problem with clear explanations and timely examples.

Problem 1: Binary Perceptron (6 points)

The first task is to implement a binary perceptron for binary classification. You'll create a PerceptronModel class with weight parameters as Parameter objects. The forward method computes the dot product between weights and input, and get_prediction returns +1 or -1 based on the sign. The train_perceptron function loops through the dataset, updating weights only on misclassified samples, and stops when one full pass has zero errors.

Trend analogy: Think of the perceptron like a social media feed algorithm deciding whether to show you a post (like +1) or hide it (dislike -1). It adjusts based on your engagement signals until it gets it right.

Implementation Steps

Initialize weight vector with dimensions 1 x dimensions.
In forward, compute weights.dot(x).
In get_prediction, return 1 if dot product >= 0 else -1.
In train_perceptron, for each sample, if prediction != label, update weights: weights += label * x.
Repeat until no errors in one epoch.

Run python autograder.py -q q1 to test. Ensure it completes within 30 seconds.

Problem 2: Nonlinear Regression (6 points)

Now we move to nonlinear regression using a neural network. Implement RegressionModel with a linear layer, ReLU activation, and output layer. Use mean squared error loss and train with gradient descent via Adam optimizer. The goal is to achieve average loss ≤ 0.02 on the test set.

Trend analogy: Imagine predicting the next-day price of a cryptocurrency like Bitcoin. The model learns nonlinear patterns from historical data (volume, sentiment) to output a continuous value.

Key Code

class RegressionModel:
    def __init__(self):
        self.layer1 = Linear(1, 128)
        self.layer2 = Linear(128, 1)
    def forward(self, x):
        h = self.layer1(x).relu()
        return self.layer2(h)

def regression_loss(pred, target):
    return mse_loss(pred, target)

Train with mini-batches and monitor validation accuracy.

Problem 3: Digit Classification (6 points)

For handwritten digit classification (MNIST-style), build DigitClassificationModel that outputs scores for 10 classes. Use cross-entropy loss and avoid ReLU in the final layer. Achieve ≥97% test accuracy.

Trend analogy: Similar to how Snapchat's AI recognizes your face to apply filters, your model learns to identify digits from pixel patterns.

Network Architecture

class DigitClassificationModel:
    def __init__(self):
        self.conv1 = Conv2d(1, 32, 5)
        self.fc1 = Linear(32*12*12, 128)
        self.fc2 = Linear(128, 10)
    def forward(self, x):
        x = self.conv1(x).relu().max_pool(2)
        x = x.flatten()
        x = self.fc1(x).relu()
        return self.fc2(x)  # no ReLU

Use dataset.get_validation_accuracy() to tune hyperparameters.

Problem 4: Language Identification (7 points)

Implement language identification using a neural network that classifies text into languages (e.g., English, Spanish, French). Use character-level features and cross-entropy loss.

Trend analogy: Google Translate's language detection works similarly, but your model will be a simplified version trained on character n-grams.

Model Structure

Embedding layer for characters
Recurrent or convolutional layers
Output layer with softmax

Train until validation accuracy plateaus.

Problem 6: Attention Mechanism (2 points)

Implement scaled dot-product attention as described in the Transformer paper: softmax( (Q * K^T) / sqrt(d_k) ) * V. Apply causal masking for autoregressive tasks.

Trend analogy: ChatGPT uses attention to focus on relevant parts of the input when generating responses. Your implementation is a building block for such models.

Code Snippet

class AttentionBlock:
    def forward(self, Q, K, V, mask=None):
        scores = Q @ K.transpose(-2,-1) / math.sqrt(d_k)
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        weights = softmax(scores, dim=-1)
        return weights @ V

Problem 7: Character GPT (0 points but interesting)

Build a character-level GPT trained on Shakespeare's text. Implement Transformer_Block and GPT forward functions. After training, run python chargpt.py to generate new text.

Trend analogy: This is a miniature version of the GPT models powering chatbots like ChatGPT, but trained on a tiny dataset to generate Shakespeare-like prose.

Architecture

Token embedding + positional encoding
Stack of Transformer blocks with self-attention and feed-forward layers
Output projection to vocabulary size

Experiment with network size and training text to see creative outputs.

Conclusion

CS188 Project 5 covers the spectrum from simple perceptrons to modern transformers. By completing these problems, you'll have a solid foundation in machine learning and deep learning. Remember to submit only one edited Python file per group to Gradescope. The staff solution runs in about 12 minutes; optimize your code if it's slower. Good luck!