Programming lesson
Characterizing Web Server Workloads Using Access Logs: A COMPSCI 315 Tutorial
Learn how to analyze web server access logs to characterize workload, including request rates, bytes transferred, response codes, and file type breakdowns. This tutorial uses real university datasets from 1995 to illustrate key concepts in Internet traffic measurement.
Introduction: Why Web Server Workload Characterization Matters
In the mid-1990s, the Web exploded. Researchers at universities like Saskatchewan and Calgary began collecting server logs to understand traffic patterns. Today, with AI-driven apps, streaming services, and e-commerce generating massive workloads, the same principles apply. Whether you're optimizing a game server for a popular battle royale or tuning an AI inference endpoint, workload characterization helps design better caching, load balancing, and user experiences. In this tutorial, you'll learn to analyze web server access logs—just like Assignment 3 and 4 in COMPSCI 315.
Understanding Web Server Access Logs
Web server logs record every request made to a server. The Common Log Format (CLF) includes: host - - [date:time zone] "request" status size. For example:
imhotep.usask.ca - - [15/Sep/1995:16:02:09 -0600] "GET /changes.html HTTP/1.0" 200 1254 This line tells us the client imhotep.usask.ca requested /changes.html successfully, transferring 1254 bytes.Measurement Mechanisms and Network Types
Server logs are collected via passive measurement at the edge network (the server itself). Analysis is typically offline, but real-time dashboards use online techniques. Is analyzing logs the only way? No—packet captures, SNMP, and active probes also characterize workload, but logs are lightweight and widely available.
Analyzing Request Rates and Data Transfer
To compute average requests per day, count total requests and divide by days in the log. For the UofS log (~7 months), assume 210 days. If total requests = 1,200,000, average = ~5,714 requests/day. Similarly, total bytes transferred (in MB) = sum of transfer_size / (1024*1024). Average MB/day = total MB / days.
Response Code Breakdown
Group status codes into: Successful (2xx), Not Modified (304), Found (302), Unsuccessful (4xx/5xx). For example, if 95% are 200, 3% are 304, 1% are 302, and 1% are 4xx/5xx, you'd report percentages. High 304s indicate effective caching—like how your phone caches TikTok videos to reduce server load.
Local vs. Remote Clients
Identify local clients by domain (e.g., usask.ca) or IP range. For UofS, local = *.usask.ca or 128.233.*.*. Compute percentage of requests and bytes from local vs. remote. Typically, remote clients generate more traffic—similar to how a university's public website gets most hits from outside.
File Type Breakdown
Categorize files by extension: HTML (.html, .htm, .shtml, .map), Images (.gif, .jpeg, .jpg, .xbm, .bmp, .rgb, .xpm), Sound (.au, .snd, .wav, .mid, .midi, .lha, .aif, .aiff), Video (.mov, .movie, .avi, .qt, .mpeg, .mpg), Formatted (.ps, .eps, .doc, .dvi, .txt), Dynamic (.cgi), Others (everything else). Compute percentage of requests and bytes for each category. Expect images and HTML to dominate, but video may consume more bytes—just like today's streaming services.
Average Transfer Size per File Type
Divide total bytes for each category by number of requests in that category. For example, if images account for 500 MB over 100,000 requests, average = 5,000 bytes. This helps in capacity planning: large average sizes hint at high-resolution content or inefficient compression.
Connecting to Modern Trends
Think about AI-powered apps like ChatGPT: server logs reveal prompt lengths, response sizes, and caching effectiveness. Similarly, gaming servers for titles like Valorant log match data to balance load. The techniques you learn here—parsing logs, computing statistics, grouping by categories—are directly applicable to cloud operations and DevOps roles.
Conclusion
By completing this analysis, you've characterized a real web server workload: request rates, byte transfer, response codes, client geography, and file type mix. These skills are foundational for network engineering, web performance optimization, and data-driven infrastructure management. Now go analyze those logs like a pro!