Green Learning Tutorial: Saab Transforms, PixelHop++, and SSL for Image Classification

Introduction to Green Learning and Successive Subspace Learning

In the rapidly evolving field of machine learning, the concept of Green Learning (GL) has emerged as a compelling alternative to traditional deep learning. Coined by Kuo et al., GL emphasizes feedforward-designed models that avoid the computationally expensive backpropagation process. This tutorial explores the foundational ideas of GL, focusing on the Saab transform, the PixelHop and PixelHop++ models, and their application to image classification tasks like MNIST and Fashion-MNIST. By understanding these methods, you can build efficient, interpretable models suitable for resource-constrained environments, a hot topic as AI moves toward edge computing and sustainable practices.

1. Origin of Green Learning: Feedforward-Designed CNNs

1.1 The Saab Transform: A Flow Diagram and Explanation

The Saab (Subspace Approximation via Adjusted Bias) transform is a key component of feedforward-designed CNNs (FF-CNNs). Unlike traditional convolutional layers that rely on nonlinear activation functions like ReLU, Saab transforms use a series of linear operations with bias adjustments to extract features. Below is a flow diagram summarizing the Saab transform:

Input Image → Patch Extraction → Unsupervised Learning (PCA) → Augmented Kernel (with bias) → Feature Maps

In essence, the Saab transform applies principal component analysis (PCA) on local patches to learn a set of orthogonal filters. A bias term is added to ensure the output features are non-negative, mimicking the effect of activation functions. This process is performed in a feedforward manner, meaning all parameters are determined without backpropagation. The GitHub repository linked in the assignment provides code for channel-wise Saab transforms, which extend the concept to handle multiple channels efficiently.

1.2 FF-CNN vs. BP-CNN: Similarities and Differences

Both FF-CNNs and backpropagation-designed CNNs (BP-CNNs) aim to learn hierarchical features for tasks like classification. However, their training paradigms differ fundamentally:

Similarities: Both use convolutional layers for feature extraction and fully connected layers for decision-making. Both can achieve high accuracy on benchmark datasets.
Differences: BP-CNNs rely on gradient descent and backpropagation to update weights iteratively, which is computationally intensive and requires careful tuning of hyperparameters. In contrast, FF-CNNs determine weights in a single forward pass using methods like PCA or least squares regression. This makes FF-CNNs faster to train and more interpretable, but they may not capture complex nonlinear relationships as effectively as BP-CNNs in very deep architectures.

This distinction is analogous to comparing a handcrafted recipe (FF-CNN) versus an AI chef that learns by tasting and adjusting (BP-CNN). The former is quick and predictable; the latter can create novel dishes but requires more resources.

2. PixelHop and PixelHop++: SSL in Action

2.1 The SSL Methodology: Deep Learning vs. SSL

Successive Subspace Learning (SSL) is a methodology that builds representations by sequentially projecting data into subspaces of increasing abstraction. Unlike deep learning, which learns all layers jointly via backpropagation, SSL trains each layer independently in a greedy, feedforward manner. This makes SSL models more interpretable and easier to train, especially with limited data. For example, in image classification, SSL first learns low-level features (edges, textures), then mid-level patterns (shapes, parts), and finally high-level concepts (object categories).

2.2 Functions of Modules 1, 2, and 3 in SSL Framework

The SSL framework typically consists of three modules:

Module 1 (Feature Extraction): Applies the first Saab transform to extract low-level features from input patches. This module captures local patterns like edges and corners.
Module 2 (Intermediate Processing): Applies additional Saab transforms or pooling operations to aggregate features and capture mid-level representations. In PixelHop++, this includes max-pooling and channel-wise operations.
Module 3 (Classification): Uses fully connected layers or a classifier (e.g., XGBoost) on the final feature vector to produce class probabilities. This module makes the final decision based on high-level features.

2.3 Neighborhood Construction and Subspace Approximation: PixelHop vs. PixelHop++

Both PixelHop and PixelHop++ use neighborhood construction to define local patches and subspace approximation to learn filters. However, they differ in how they handle multiple channels:

PixelHop Unit: Uses the basic Saab transform, which applies PCA on all channels jointly. This can lead to a large number of parameters when the input has many channels.
PixelHop++ Unit: Employs a channel-wise (c/w) Saab transform, which processes each channel independently before combining. This reduces model size and improves efficiency. The c/w Saab transform is better suited for deep networks with many feature maps.

For example, in the MNIST dataset, PixelHop++ with c/w Saab can achieve comparable accuracy to PixelHop while using fewer parameters, making it more suitable for deployment on devices with limited memory.

3. Practical Implementation: MNIST and Fashion-MNIST Classification

3.1 Building the PixelHop++ Model

To build a PixelHop++ model for MNIST classification, follow these steps based on the provided parameters (neighborhood size 5×5, stride 1, max-pooling 2×2 to 1×1, energy thresholds TH1=0.005, TH2=0.001, classifier XGBoost with 100 estimators):

Train Module 1: Use 10,000 training images (1,000 per class) to learn the first c/w Saab transform. Record training time and accuracy.
Train Module 3: Extract Hop3 features from the training set and train the XGBoost classifier. Report training accuracy and model size (total number of parameters).
Test on 10,000 images: Evaluate the model and report test accuracy.
Vary TH1: Experiment with different TH1 values (e.g., 0.001, 0.01, 0.05) and plot TH1 vs. test accuracy. Discuss how energy threshold affects model size and performance.

In practice, you might find that a higher TH1 (more aggressive filtering) reduces model size but may lower accuracy if too many informative features are discarded. This trade-off is crucial for deploying models on edge devices, similar to how a smartphone app balances performance and battery life.

3.2 Comparing PixelHop and PixelHop++

Using the same hyperparameters but with basic Saab transform (PixelHop) instead of c/w Saab (PixelHop++), compare the two models:

Train and test accuracy: PixelHop++ often achieves slightly higher accuracy due to better channel handling, but PixelHop may be simpler to implement.
Model size: PixelHop++ typically has fewer parameters because the c/w Saab transform avoids redundant filters across channels. This makes PixelHop++ more memory-efficient.

This comparison mirrors the trend in AI where efficient architectures like MobileNet outperform larger models in resource-constrained scenarios.

3.3 Error Analysis

For a model trained on 50,000 images, compute the confusion matrix and identify easy and hard classes. For MNIST, digits like 0 and 1 are often easy, while 4 and 9 may be confused. For Fashion-MNIST, categories like shirt and coat may have high confusion. Use a heatmap to visualize errors and propose improvements:

Confusing class groups: Analyze why certain pairs are confused (e.g., similar shapes).
Improvement ideas: Use data augmentation (e.g., rotations, distortions) for hard classes, or introduce a separate classifier for ambiguous pairs. This is akin to how a sports team focuses on weak opponents by analyzing game footage.

Conclusion

Green Learning offers a refreshing alternative to traditional deep learning, emphasizing efficiency and interpretability. By mastering concepts like the Saab transform, SSL, and PixelHop++, you can build powerful image classifiers without the need for massive computational resources. As AI continues to permeate everyday applications—from smartphone cameras to autonomous drones—these techniques will become increasingly valuable. Experiment with the provided code and datasets to deepen your understanding, and consider how GL can be applied to your own projects.