ECE490 Homework 4 Solutions: Euclidean Projection & Gradient Descent

Introduction to ECE490 Homework 4: Optimization Fundamentals

Welcome to this tutorial on ECE490 Homework 4, where we dive into key concepts from Wright and Recht, Chapters 6 and 7. This assignment covers empirical risk minimization (ERM), Euclidean projection, linear function minimization over convex sets, and projected gradient descent. Whether you're preparing for exams or tackling ECE490 homework solutions, this guide will help you understand the underlying theory with practical, trend-inspired analogies.

In today's AI-driven world, these optimization techniques power everything from training neural networks to fine-tuning large language models like ChatGPT. Think of projecting onto a unit ball as similar to normalizing game scores in a tournament bracket—keeping values within a bounded range ensures fair comparisons. Let's break down each problem step by step.

Problem 1: Cost of Computing f(x + γe_i) in ERM

This problem asks you to show that the cost of evaluating the objective function at a perturbed point is O(|A·i|), the same order as updating the gradient. In the context of empirical risk minimization, this is crucial for efficiency in large-scale machine learning.

Understanding the ERM Framework

In Section 6.1, the objective function f(x) = (1/2)||Ax - b||^2 + λR(x) involves a sparse matrix A. When we compute f(x + γe_i), we only need to update the part of the residual affected by the i-th column of A. Since A·i has |A·i| nonzeros, the cost is proportional to that number.

Similarly, updating the gradient ∇f(x) = A^T(Ax - b) + λ∇R(x) requires computing A^T times the residual. The i-th component of the gradient update involves only the nonzeros in the i-th column. Hence, both operations have the same complexity.

Analogy: Updating a Gaming Leaderboard

Imagine you're maintaining a leaderboard for a popular battle royale game like Fortnite. Each player's score depends on their performance across multiple matches. If one player's score changes (like perturbing x), you only need to recalculate the scores of opponents they directly interacted with—similar to accessing only nonzero entries in a column. This targeted update saves computational resources, just like in ERM.

Problem 2: Euclidean Projection onto the Unit Ball

You need to show that the projection P_Ω(x) onto the unit ball Ω = {x ∈ R^n : ||x|| ≤ 1} is given by x / max(1, ||x||). This is a classic result in convex optimization.

Proof Outline

If ||x|| ≤ 1, the point is already inside the ball, so projection is itself. Otherwise, the closest point on the boundary is the normalized vector x/||x||. This can be proven by solving the constrained optimization problem using Lagrange multipliers or by geometric intuition.

Real-World Application: AI Model Regularization

In training AI models like Stable Diffusion, weights are often constrained to a unit ball to prevent overfitting. This is similar to limiting the influence of any single feature—like capping the volume on your headphones to avoid distortion. The projection ensures that the updated weights remain within a stable range.

Problem 3: Minimizing a Linear Function over Different Sets

Given f(x) = c^T x, find the minimizer over: (a) unit ball, (b) unit simplex, (c) box [0,1]^n.

Part (a): Unit Ball

The minimum occurs at x = -c / ||c||, with value -||c||. This is because the linear function is minimized in the direction opposite to c, and the constraint forces the point onto the boundary.

Part (b): Unit Simplex

The simplex constraint requires nonnegative entries summing to 1. The minimizer is a unit vector e_j where j corresponds to the most negative component of c. If all components are nonnegative, the minimum is 0 at any vertex? Actually, careful: Since c^T x is linear, the minimum over a convex polytope occurs at an extreme point. The extreme points of the simplex are the standard basis vectors. So we choose e_j where c_j is minimal. If c has negative entries, the minimum is that smallest entry.

Part (c): Box [0,1]^n

Here, each x_i ∈ [0,1]. The minimizer sets x_i = 1 if c_i < 0, and x_i = 0 if c_i > 0. If c_i = 0, any value works. This is like turning on a feature only if it reduces cost—similar to activating bonus multipliers in a game only when they benefit you.

Problem 4: Projected Gradient Descent Step Analysis

You need to prove that for one step of projected gradient descent, x_{k+1} = argmin_{x∈Ω} [∇f(x_k)^T (x - x_k) + (1/(2α_k))||x - x_k||^2] and that ||x_{k+1} - x_k||^2 ≤ α_k ∇f(x_k)^T (x_k - x_{k+1}).

Understanding the Update

The update is essentially a trade-off between moving in the direction of steepest descent and staying close to the current point. The inequality shows that the step size is bounded by the inner product of the gradient and the step direction.

Proof Sketch

By optimality conditions for the projected point, we have (x_{k+1} - x_k + α_k ∇f(x_k))^T (x - x_{k+1}) ≥ 0 for all x ∈ Ω. Setting x = x_k gives the inequality after rearranging.

Analogy: Navigating a Viral App Feature Update

Think of updating a feature in a popular app like TikTok. You want to improve user engagement (minimize f) but you can't change too drastically (projection onto feasible set). The inequality ensures that the magnitude of change is bounded by the expected improvement—like testing a new algorithm on a small subset before full rollout.

Conclusion

By working through these problems, you've gained insight into fundamental optimization techniques used in modern AI, gaming, and app development. For more ECE490 homework help, remember to leverage sparse structures and geometric intuition. These concepts are not just academic—they're the backbone of efficient algorithms in data science and engineering.