Programming lesson
Mastering Cauchy–Schwarz and Convex Analysis: A Step-by-Step Guide for ECE490 Homework 1
Struggling with ECE490 Homework 1? This tutorial breaks down the Cauchy–Schwarz inequality, convex epigraphs, and Lipschitz gradient properties with clear proofs and modern examples from AI and finance.
Introduction to ECE490 Homework 1: Core Concepts in Convex Analysis
ECE490 Homework 1 covers fundamental inequalities and convexity properties that are essential for optimization and machine learning. In May 2026, as AI models grow more complex, understanding these mathematical foundations helps you debug training dynamics and design efficient algorithms. This tutorial will guide you through proving the Cauchy–Schwarz inequality, convex epigraphs, and properties of strongly convex functions with Lipschitz gradients.
Part 1: Proving the Cauchy–Schwarz Inequality
Understanding the Inequality
The Cauchy–Schwarz inequality states that for any vectors u and v in Rn, the absolute value of their dot product is at most the product of their Euclidean norms: |uTv| ≤ ||u|| ||v||. This inequality appears in data science when measuring cosine similarity between feature vectors, and in finance when computing correlations between asset returns.
Step (a): Proving a Key Inequality for All γ > 0
We start by proving: for all γ > 0, uTv ≤ (γ/2)||u||2 + (1/(2γ))||v||2. First, consider the case n=1. For scalars a and b, we have ab ≤ (γ/2)a2 + (1/(2γ))b2. This follows from expanding (√γ a - (1/√γ)b)2 ≥ 0. For general n, sum over components: uTv = Σ uivi ≤ Σ [ (γ/2)ui2 + (1/(2γ))vi2 ] = (γ/2)||u||2 + (1/(2γ))||v||2. This inequality is used in support vector machines to derive the dual formulation.
Step (b): Deducing Cauchy–Schwarz
To deduce the Cauchy–Schwarz inequality, minimize the right-hand side over γ > 0. The function φ(γ) = (γ/2)||u||2 + (1/(2γ))||v||2 attains its minimum when γ = ||v||/||u||. Substituting gives φmin = ||u|| ||v||. Thus, uTv ≤ ||u|| ||v||. Replacing u with -u gives |uTv| ≤ ||u|| ||v||. This proof technique is similar to how you'd optimize learning rates in gradient descent.
Part 2: Convex Epigraphs
Definition and Proof
The epigraph of a function f: Rn → R is the set epi f = { (x, t) ∈ Rn × R : f(x) ≤ t }. Prove that if f is convex, then epi f is convex. Take any two points (x1, t1) and (x2, t2) in epi f, so f(x1) ≤ t1 and f(x2) ≤ t2. For any θ ∈ [0,1], by convexity of f: f(θx1 + (1-θ)x2) ≤ θf(x1) + (1-θ)f(x2) ≤ θt1 + (1-θ)t2. Hence (θx1 + (1-θ)x2, θt1 + (1-θ)t2) ∈ epi f, proving convexity. This property is used in convex optimization to transform problems into cone programming.
Part 3: Strong Convexity and Lipschitz Gradient Cannot Coexist
Proof by Contradiction
Assume f: Rn → R is both m-strongly convex and L-Lipschitz. Strong convexity implies f(y) ≥ f(x) + ∇f(x)T(y-x) + (m/2)||y-x||2. Lipschitz continuity implies |f(y)-f(x)| ≤ L||y-x||. Combining for fixed x and letting y = x + d, we get (m/2)||d||2 ≤ f(y)-f(x)-∇f(x)Td ≤ L||d|| + ||∇f(x)|| ||d||. As ||d||→∞, the left side grows quadratically while the right side grows linearly, a contradiction unless m=0. Thus no non-zero m works. This explains why neural network loss functions cannot be both strongly convex and Lipschitz, a key insight for training stability in 2026's large-scale models.
Part 4: Convex Function with L-Lipschitz Gradient
Part (a): Basic Inequality
For a convex function f with L-Lipschitz gradient and minimizer x*, we have f(x) - f(x*) ≥ (1/(2L))||∇f(x)||2. This inequality is derived from the descent lemma and is crucial for analyzing gradient descent convergence. In 2026, this is used in federated learning to bound communication rounds.
Part (b): Co-coercivity
Co-coercivity states: (∇f(x)-∇f(y))T(x-y) ≥ (1/L)||∇f(x)-∇f(y)||2. Apply part (a) to hx(z)=f(z)-∇f(x)Tz and hy(z)=f(z)-∇f(y)Tz. Both are convex with L-Lipschitz gradient. Evaluate at z=y and z=x respectively, then add the inequalities to obtain co-coercivity. This property is used in the analysis of accelerated gradient methods like Nesterov's.
Part 5: Strongly Convex with Lipschitz Gradient
Part (a): Constructing a Convex Function with Reduced Lipschitz Constant
Define q(x) = f(x) - (m/2)||x||2. Show q is convex and has (L-m)-Lipschitz gradient. Since f is m-strongly convex, f(x) - (m/2)||x||2 is convex. The gradient of q is ∇q(x)=∇f(x)-mx, and its Lipschitz constant is L-m. This trick is used in primal-dual algorithms to improve conditioning.
Part (b): Applying Co-coercivity to q
Apply co-coercivity to q: (∇q(x)-∇q(y))T(x-y) ≥ (1/(L-m))||∇q(x)-∇q(y)||2. Substitute ∇q and simplify to get: (∇f(x)-∇f(y))T(x-y) ≥ (m(L-m))/(L)||x-y||2 + (1/L)||∇f(x)-∇f(y)||2. This inequality is tighter than standard co-coercivity and is used in analyzing stochastic gradient methods for overparameterized models.
Conclusion
Mastering these proofs not only helps you ace ECE490 Homework 1 but also builds intuition for modern optimization in AI and finance. In 2026, as we train ever-larger models, these inequalities underpin everything from convergence guarantees to robustness. Practice by applying them to simple functions like quadratics, and you'll see how they connect to real-world algorithms like Adam and SGD.