Programming lesson
Mastering CS188 Project 5: From Binary Perceptron to Character GPT – A Machine Learning Tutorial
Learn to implement binary perceptron, nonlinear regression, digit classification, language ID, and attention mechanisms in CS188 Project 5. Step-by-step guide with code examples and trend-inspired analogies.
Introduction to CS188 Project 5: Machine Learning Foundations
In this tutorial, we dive into CS188 Project 5, where you'll implement core machine learning models from scratch. From a simple binary perceptron to a character-level GPT, you'll gain hands-on experience with gradient-based learning, neural networks, and attention mechanisms. Whether you're preparing for AI job interviews or building your own AI app, these skills are essential. Let's break down each problem with clear explanations and timely examples.
Problem 1: Binary Perceptron (6 points)
The first task is to implement a binary perceptron for binary classification. You'll create a PerceptronModel class with weight parameters as Parameter objects. The forward method computes the dot product between weights and input, and get_prediction returns +1 or -1 based on the sign. The train_perceptron function loops through the dataset, updating weights only on misclassified samples, and stops when one full pass has zero errors.
Trend analogy: Think of the perceptron like a social media feed algorithm deciding whether to show you a post (like +1) or hide it (dislike -1). It adjusts based on your engagement signals until it gets it right.
Implementation Steps
- Initialize weight vector with dimensions
1 x dimensions. - In
forward, computeweights.dot(x). - In
get_prediction, return 1 if dot product >= 0 else -1. - In
train_perceptron, for each sample, if prediction != label, update weights:weights += label * x. - Repeat until no errors in one epoch.
Run python autograder.py -q q1 to test. Ensure it completes within 30 seconds.
Problem 2: Nonlinear Regression (6 points)
Now we move to nonlinear regression using a neural network. Implement RegressionModel with a linear layer, ReLU activation, and output layer. Use mean squared error loss and train with gradient descent via Adam optimizer. The goal is to achieve average loss ≤ 0.02 on the test set.
Trend analogy: Imagine predicting the next-day price of a cryptocurrency like Bitcoin. The model learns nonlinear patterns from historical data (volume, sentiment) to output a continuous value.
Key Code
class RegressionModel:
def __init__(self):
self.layer1 = Linear(1, 128)
self.layer2 = Linear(128, 1)
def forward(self, x):
h = self.layer1(x).relu()
return self.layer2(h)
def regression_loss(pred, target):
return mse_loss(pred, target)Train with mini-batches and monitor validation accuracy.
Problem 3: Digit Classification (6 points)
For handwritten digit classification (MNIST-style), build DigitClassificationModel that outputs scores for 10 classes. Use cross-entropy loss and avoid ReLU in the final layer. Achieve ≥97% test accuracy.
Trend analogy: Similar to how Snapchat's AI recognizes your face to apply filters, your model learns to identify digits from pixel patterns.
Network Architecture
class DigitClassificationModel:
def __init__(self):
self.conv1 = Conv2d(1, 32, 5)
self.fc1 = Linear(32*12*12, 128)
self.fc2 = Linear(128, 10)
def forward(self, x):
x = self.conv1(x).relu().max_pool(2)
x = x.flatten()
x = self.fc1(x).relu()
return self.fc2(x) # no ReLUUse dataset.get_validation_accuracy() to tune hyperparameters.
Problem 4: Language Identification (7 points)
Implement language identification using a neural network that classifies text into languages (e.g., English, Spanish, French). Use character-level features and cross-entropy loss.
Trend analogy: Google Translate's language detection works similarly, but your model will be a simplified version trained on character n-grams.
Model Structure
- Embedding layer for characters
- Recurrent or convolutional layers
- Output layer with softmax
Train until validation accuracy plateaus.
Problem 6: Attention Mechanism (2 points)
Implement scaled dot-product attention as described in the Transformer paper: softmax( (Q * K^T) / sqrt(d_k) ) * V. Apply causal masking for autoregressive tasks.
Trend analogy: ChatGPT uses attention to focus on relevant parts of the input when generating responses. Your implementation is a building block for such models.
Code Snippet
class AttentionBlock:
def forward(self, Q, K, V, mask=None):
scores = Q @ K.transpose(-2,-1) / math.sqrt(d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, -1e9)
weights = softmax(scores, dim=-1)
return weights @ VProblem 7: Character GPT (0 points but interesting)
Build a character-level GPT trained on Shakespeare's text. Implement Transformer_Block and GPT forward functions. After training, run python chargpt.py to generate new text.
Trend analogy: This is a miniature version of the GPT models powering chatbots like ChatGPT, but trained on a tiny dataset to generate Shakespeare-like prose.
Architecture
- Token embedding + positional encoding
- Stack of Transformer blocks with self-attention and feed-forward layers
- Output projection to vocabulary size
Experiment with network size and training text to see creative outputs.
Conclusion
CS188 Project 5 covers the spectrum from simple perceptrons to modern transformers. By completing these problems, you'll have a solid foundation in machine learning and deep learning. Remember to submit only one edited Python file per group to Gradescope. The staff solution runs in about 12 minutes; optimize your code if it's slower. Good luck!