Programming lesson
Penalized Spline Regression and Kernel Methods: A Tutorial for MAST90083 Assignment 2
Learn to code penalized spline regression, k-NN averaging, kernel regression, and EM algorithm for Gaussian mixtures using the WarsawApts dataset. Step-by-step guidance for MAST90083 Assignment 2.
Introduction to Computational Statistics with Splines and Kernels
In this tutorial, we explore key techniques from MAST90083 Assignment 2: penalized spline regression, k-nearest neighbor averaging, kernel regression, and expectation-maximization for Gaussian mixtures. These methods are fundamental in data science, used in applications from smoothing financial time series to clustering user behavior in apps. Using the WarsawApts dataset from the HRW library, we'll implement each method step by step. The dataset contains apartment prices in Warsaw, which we can relate to current trends in real estate analytics and AI-driven pricing models.
Setting Up the WarsawApts Dataset
Load the required libraries and data:
library(HRW)
data(WarsawApts)
x <- WarsawApts$construction.date
y <- WarsawApts$areaPerMzloty
n <- length(x)
The variable x represents construction dates, and y is area per million zloty. We'll use these to model non-linear relationships.
Question 1: Penalized Spline Regression
1.1 Generating Knots for Linear Spline Basis
We generate 20 knots using quantiles, excluding extremes to match the data range:
probs <- seq(0, 1, length = 22)[-c(1, 22)]
k <- quantile(x, probs = probs)
This ensures knots are spread across the data distribution, similar to placing anchors in a game level design.
1.2 Constructing the Z Matrix
Create a matrix of linear spline basis functions (x - k_i)+:
Z <- outer(x, k, function(x, k) pmax(x - k, 0))
matplot(x, Z, type = "l", ylim = c(-1, 2), main = "Linear Spline Basis (20 knots)")
Each column is a basis function that activates after its knot, like level thresholds in a game.
1.3 Building C and D Matrices
Construct the design matrix C with a column of ones, x, and Z:
C <- cbind(1, x, Z)
The penalty matrix D is diagonal with ones except first two entries zero:
D <- diag(22)
D[1:2, 1:2] <- 0
This penalizes only the spline coefficients, not the intercept and linear term.
1.4 Tuning Parameter and Fitted Curves
Define 100 lambda values and compute fitted curves:
lambda_seq <- seq(0, 50, length = 100)
beta_hat <- solve(t(C) %*% C + lambda * D) %*% t(C) %*% y
y_hat <- C %*% beta_hat
For each lambda, compute RSS, degrees of freedom, and GCV:
RSS <- sum((y - y_hat)^2)
df <- sum(diag(C %*% solve(t(C) %*% C + lambda * D) %*% t(C)))
GCV <- RSS / (n - df)^2
GCV balances fit and complexity, like tuning a recommendation algorithm to avoid overfitting.
1.5 Optimal Lambda and Fit
Find lambda minimizing GCV and plot the fit:
lambda_opt <- lambda_seq[which.min(GCV)]
y_hat_opt <- C %*% solve(t(C) %*% C + lambda_opt * D) %*% t(C) %*% y
plot(x, y, main = "Penalized Spline Fit (20 knots)")
lines(x[order(x)], y_hat_opt[order(x)], col = "red", lwd = 2)
Increasing knots to 60 improves flexibility. Compute MSE:
MSE_20 <- mean((y - y_hat_opt)^2)
MSE_60 <- mean((y - y_hat_opt_60)^2)
Typically, MSE decreases with more knots but risks overfitting, similar to increasing model parameters in AI.
Question 2: Non-parametric Regression
2.1 k-Nearest Neighbor Averaging
Implement k-NN for k = 3, 7, 11, 15, 19, 23:
k_values <- seq(3, 23, length = 6)
y_hat_knn <- matrix(NA, n, length(k_values))
for (j in seq_along(k_values)) {
for (i in 1:n) {
d <- abs(x[i] - x)
indices <- order(d)[1:k_values[j]]
y_hat_knn[i, j] <- mean(y[indices])
}
}
Plot results in a 3x2 grid and compute MSE:
par(mfrow = c(3, 2))
for (j in seq_along(k_values)) {
plot(x, y, main = paste("k-NN k =", k_values[j]))
lines(x[order(x)], y_hat_knn[order(x), j], col = "blue", lwd = 2)
MSE_knn[j] <- mean((y - y_hat_knn[, j])^2)
}
Small k captures noise, large k oversmooths. Compare MSE with spline: k-NN may perform better for local patterns.
2.2 Kernel Regression with Six Kernels
Write a function that computes six kernel values (e.g., Gaussian, Epanechnikov, etc.):
kernel_fun <- function(u, type) {
switch(type,
"gaussian" = dnorm(u),
"epanechnikov" = 0.75 * (1 - u^2) * (abs(u) <= 1),
"uniform" = 0.5 * (abs(u) <= 1),
"triangular" = (1 - abs(u)) * (abs(u) <= 1),
"biweight" = (15/16) * (1 - u^2)^2 * (abs(u) <= 1),
"triweight" = (35/32) * (1 - u^2)^3 * (abs(u) <= 1)
)
}
With bandwidth h = 2, compute Nadaraya-Watson estimator:
h <- 2
f_hat <- matrix(NA, n, 6)
for (i in 1:n) {
u <- (x - x[i]) / h
K <- sapply(u, kernel_fun, type = "gaussian") # repeat for each kernel
f_hat[i, ] <- colSums(K * y) / colSums(K)
}
Plot fits and compare MSE; Gaussian kernel often performs best.
Question 3: Expectation-Maximization for Gaussian Mixtures
3.1 EM Algorithm for Three Components
We extend the two-component EM to three components with different standard deviations. Use the image data from the assignment (e.g., cameraman). Load and vectorize:
library(png)
img <- readPNG("cameraman.png")
y <- as.vector(img)
Initialize parameters: means, standard deviations, and mixing proportions.
K <- 3
mu <- sample(y, K)
sigma <- rep(sd(y), K)
pi <- rep(1/K, K)
E-step: compute responsibilities
resp <- matrix(0, length(y), K)
for (k in 1:K) {
resp[, k] <- pi[k] * dnorm(y, mu[k], sigma[k])
}
resp <- resp / rowSums(resp)
M-step: update parameters
for (k in 1:K) {
Nk <- sum(resp[, k])
mu[k] <- sum(resp[, k] * y) / Nk
sigma[k] <- sqrt(sum(resp[, k] * (y - mu[k])^2) / Nk)
pi[k] <- Nk / length(y)
}
Iterate until convergence. The algorithm is used in clustering image pixels or user segments in apps.
Conclusion
This tutorial covered penalized spline regression, k-NN, kernel regression, and EM for Gaussian mixtures. These techniques are widely applied in data science, from smoothing trends in finance to clustering in AI. Practice with the WarsawApts dataset to master these methods for your assignment.