Programming lesson
WiFi Retransmission Analysis & Machine Learning: A Hands-On Guide for COMP4336
Learn how to collect WiFi traffic data, analyze retransmissions, and apply machine learning to predict packet retransmissions in this COMP4336 project tutorial.
Introduction: Why WiFi Retransmissions Matter in 2026
With the explosion of IoT devices, 5G offloading, and hybrid work, WiFi networks are more congested than ever. In May 2026, a typical university library might host hundreds of devices streaming lectures, gaming, or running AI assistants. Retransmissions—when a packet fails and must be resent—are a silent killer of network performance. This tutorial guides you through the COMP4336 project: collecting real WiFi data, analyzing retransmission impact, and building machine learning models to predict retransmissions. You'll gain skills directly applicable to network engineering, cybersecurity, or R&D roles.
1. Data Collection: Capturing Real-World WiFi Traffic
Choosing the Right Locations
As the assignment specifies, select crowded areas with high AP density—think libraries, CSE buildings, or busy cafes. In 2026, many campuses have 'smart' zones where hundreds of devices compete. Capture data at different times (e.g., 10 AM vs 3 PM) and across multiple days to ensure diversity.
Using Wireshark Effectively
Launch Wireshark in monitor mode (if supported) to capture 802.11 frames. Filter for retransmissions using wlan.fc.retry == 1. Save captures in pcapng format. Aim for at least 100,000 packets to ensure statistical validity for ML training. Pro tip: Avoid capturing in low-traffic periods—you need collisions to study retransmissions.
2. Performance Analysis: Quantifying the Damage
Retransmission Distribution
After capture, use Python to parse the pcap. Count how many times each packet is retransmitted. Plot a bar chart: x-axis = number of retransmissions (1,2,3...), y-axis = count. Typically, you'll see an exponential decay—most packets are retransmitted once, few twice, etc. This visualization reveals the severity of congestion.
Latency, Throughput, and Efficiency
Pick the most congested network (MAC address with highest retransmission rate). For latency, pair data packets with their ACK frames (using sequence numbers) and compute timestamp differences. For throughput, sum data bytes over capture duration. For efficiency, divide useful data by total bytes (including retransmissions). Plot these metrics against retransmission frequency to see correlations. For example, as retransmissions increase, latency spikes and throughput plummets—like a popular AI app slowing down during peak usage.
3. Machine Learning Evaluation: Predicting Retransmissions
Feature Extraction
Extract features from each packet: signal strength (dBm), packet size (bytes), channel utilization (percentage of time channel busy), data rate, and time since last retransmission. You can compute channel utilization by analyzing beacon frames. Store features in a DataFrame with a binary label: 1 if retransmitted, 0 otherwise.
Model Training
Split data 80/20 train/test. Train three models: k-Nearest Neighbors (k=5), Random Forest (100 trees), and SVM (RBF kernel). Use scikit-learn's train_test_split and accuracy_score. Expected accuracy: Random Forest often performs best (85-90%) because it handles mixed features well. Example code snippet:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
preds = model.predict(X_test)
print(accuracy_score(y_test, preds))Model Interpretation
Use feature importance from Random Forest to see which factors matter most. Typically, signal strength and channel utilization are top predictors—weak signal and busy channels lead to retransmissions. This insight suggests protocol enhancements: adaptive rate control or channel selection. For instance, modern WiFi 7 (802.11be) uses multi-link operation to avoid congested channels—a real-world application of your findings.
4. Protocol Enhancement: Thinking Beyond the Data
Based on your analysis, propose a small protocol tweak. Example: a 'retransmission-aware' backoff algorithm that increases contention window more aggressively when retransmissions are high. This could reduce collisions in dense environments. Connect this to trends like smart stadiums or AI-driven network optimization used by major tech companies.
Conclusion
This COMP4336 project bridges data collection, performance analysis, and ML. By understanding retransmissions, you're not just completing an assignment—you're building skills for real-world network challenges. Whether you're debugging a smart home network or designing 6G systems, these techniques are foundational. Good luck, and remember: don't share your solution publicly; instead, use this guide to structure your own original work.