ECE490 Homework 1 Guide: Cauchy–Schwarz & Convex Analysis Proofs (2026)

Introduction to ECE490 Homework 1: Core Concepts in Convex Analysis

ECE490 Homework 1 covers fundamental inequalities and convexity properties that are essential for optimization and machine learning. In May 2026, as AI models grow more complex, understanding these mathematical foundations helps you debug training dynamics and design efficient algorithms. This tutorial will guide you through proving the Cauchy–Schwarz inequality, convex epigraphs, and properties of strongly convex functions with Lipschitz gradients.

Part 1: Proving the Cauchy–Schwarz Inequality

Understanding the Inequality

The Cauchy–Schwarz inequality states that for any vectors u and v in Rn, the absolute value of their dot product is at most the product of their Euclidean norms: |uTv| ≤ ||u|| ||v||. This inequality appears in data science when measuring cosine similarity between feature vectors, and in finance when computing correlations between asset returns.

Step (a): Proving a Key Inequality for All γ > 0

We start by proving: for all γ > 0, uTv ≤ (γ/2)||u||2 + (1/(2γ))||v||2. First, consider the case n=1. For scalars a and b, we have ab ≤ (γ/2)a2 + (1/(2γ))b2. This follows from expanding (√γ a - (1/√γ)b)2 ≥ 0. For general n, sum over components: uTv = Σ uivi ≤ Σ [ (γ/2)ui2 + (1/(2γ))vi2 ] = (γ/2)||u||2 + (1/(2γ))||v||2. This inequality is used in support vector machines to derive the dual formulation.

Step (b): Deducing Cauchy–Schwarz

To deduce the Cauchy–Schwarz inequality, minimize the right-hand side over γ > 0. The function φ(γ) = (γ/2)||u||2 + (1/(2γ))||v||2 attains its minimum when γ = ||v||/||u||. Substituting gives φmin = ||u|| ||v||. Thus, uTv ≤ ||u|| ||v||. Replacing u with -u gives |uTv| ≤ ||u|| ||v||. This proof technique is similar to how you'd optimize learning rates in gradient descent.

Part 2: Convex Epigraphs

Definition and Proof

The epigraph of a function f: Rn → R is the set epi f = { (x, t) ∈ Rn × R : f(x) ≤ t }. Prove that if f is convex, then epi f is convex. Take any two points (x1, t1) and (x2, t2) in epi f, so f(x1) ≤ t1 and f(x2) ≤ t2. For any θ ∈ [0,1], by convexity of f: f(θx1 + (1-θ)x2) ≤ θf(x1) + (1-θ)f(x2) ≤ θt1 + (1-θ)t2. Hence (θx1 + (1-θ)x2, θt1 + (1-θ)t2) ∈ epi f, proving convexity. This property is used in convex optimization to transform problems into cone programming.

Part 3: Strong Convexity and Lipschitz Gradient Cannot Coexist

Proof by Contradiction

Assume f: Rn → R is both m-strongly convex and L-Lipschitz. Strong convexity implies f(y) ≥ f(x) + ∇f(x)T(y-x) + (m/2)||y-x||2. Lipschitz continuity implies |f(y)-f(x)| ≤ L||y-x||. Combining for fixed x and letting y = x + d, we get (m/2)||d||2 ≤ f(y)-f(x)-∇f(x)Td ≤ L||d|| + ||∇f(x)|| ||d||. As ||d||→∞, the left side grows quadratically while the right side grows linearly, a contradiction unless m=0. Thus no non-zero m works. This explains why neural network loss functions cannot be both strongly convex and Lipschitz, a key insight for training stability in 2026's large-scale models.

Part 4: Convex Function with L-Lipschitz Gradient

Part (a): Basic Inequality

For a convex function f with L-Lipschitz gradient and minimizer x*, we have f(x) - f(x*) ≥ (1/(2L))||∇f(x)||2. This inequality is derived from the descent lemma and is crucial for analyzing gradient descent convergence. In 2026, this is used in federated learning to bound communication rounds.

Part (b): Co-coercivity

Co-coercivity states: (∇f(x)-∇f(y))T(x-y) ≥ (1/L)||∇f(x)-∇f(y)||2. Apply part (a) to hx(z)=f(z)-∇f(x)Tz and hy(z)=f(z)-∇f(y)Tz. Both are convex with L-Lipschitz gradient. Evaluate at z=y and z=x respectively, then add the inequalities to obtain co-coercivity. This property is used in the analysis of accelerated gradient methods like Nesterov's.

Part 5: Strongly Convex with Lipschitz Gradient

Part (a): Constructing a Convex Function with Reduced Lipschitz Constant

Define q(x) = f(x) - (m/2)||x||2. Show q is convex and has (L-m)-Lipschitz gradient. Since f is m-strongly convex, f(x) - (m/2)||x||2 is convex. The gradient of q is ∇q(x)=∇f(x)-mx, and its Lipschitz constant is L-m. This trick is used in primal-dual algorithms to improve conditioning.

Part (b): Applying Co-coercivity to q

Apply co-coercivity to q: (∇q(x)-∇q(y))T(x-y) ≥ (1/(L-m))||∇q(x)-∇q(y)||2. Substitute ∇q and simplify to get: (∇f(x)-∇f(y))T(x-y) ≥ (m(L-m))/(L)||x-y||2 + (1/L)||∇f(x)-∇f(y)||2. This inequality is tighter than standard co-coercivity and is used in analyzing stochastic gradient methods for overparameterized models.

Conclusion

Mastering these proofs not only helps you ace ECE490 Homework 1 but also builds intuition for modern optimization in AI and finance. In 2026, as we train ever-larger models, these inequalities underpin everything from convergence guarantees to robustness. Practice by applying them to simple functions like quadratics, and you'll see how they connect to real-world algorithms like Adam and SGD.