Parallel Data Processing Design Sprint: Analyzing Hamlet with C++ Threads

Introduction to Parallel Data Processing with C++

In today's data-driven world, processing large files efficiently is crucial. Whether you're analyzing logs from a gaming server, parsing social media trends, or handling financial datasets, parallel programming can dramatically reduce execution time. This tutorial guides you through a design sprint similar to CSCN73000 Assignment 3, where you'll build a parallel application to analyze Shakespeare's Hamlet. You'll count specific words and total words using threads, measure performance, and justify your design choices.

Why Parallel Processing Matters

Sequential programs read data line by line, which is simple but slow for large files. By splitting work across multiple threads, you can achieve speedup and better efficiency. Think of it like a team of data scientists each analyzing a different chapter of a book simultaneously. This approach is used in real-world big data pipelines, AI training, and even in finance for high-frequency trading.

Understanding the Assignment Requirements

Your task is to load a large ASCII text file (Hamlet.txt) and perform the following:

Count occurrences of words: Horatio, and, Hamlet, God (case-insensitive, including punctuation).
Count total words in the file.
Complete word counts (requirement #1) before total word count (requirement #2).
Use a thread pool for requirement #2.
Measure execution time, latency, speedup, and efficiency.

Designing Your Parallel Solution

Start by reading the file into memory or processing it in chunks. A common technique is to divide the file into segments, each processed by a separate thread. For requirement #1, you can use a fixed number of threads (e.g., 4) to count the four words. Use mutexes or atomic operations to safely update shared counters. For requirement #2, implement a thread pool that processes chunks of text and sums word counts. This mirrors modern cloud computing patterns where tasks are distributed among workers.

Choosing the Right Synchronization

When multiple threads update the same counter, you need synchronization. Options include std::mutex, std::atomic, or thread-local accumulators merged later. For high contention, thread-local storage reduces locking overhead. This is similar to how AI model training uses gradient accumulation across GPUs.

Step-by-Step Implementation Guide

1. Load the File

Read the entire file into a std::string or process it line by line. For large files, memory-mapping can be efficient.

std::ifstream file("Hamlet.txt");
std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());

2. Count Specific Words (Requirement #1)

Create a thread for each word or use a thread pool. Use std::regex or manual parsing to find words including punctuation. Example using thread-local counters:

std::atomic<int> counts[4] = {0};
auto countWord = [&](const std::string& text, const std::string& word, int index) {
    int localCount = 0;
    size_t pos = 0;
    while ((pos = text.find(word, pos)) != std::string::npos) {
        // check boundaries
        if ((pos == 0 || !isalpha(text[pos-1])) && (pos+word.size() >= text.size() || !isalpha(text[pos+word.size()])))
            ++localCount;
        pos += word.size();
    }
    counts[index] += localCount;
};

Then launch threads for each word. Ensure all threads finish before proceeding (use join).

3. Count Total Words (Requirement #2) with Thread Pool

Divide the file into chunks (e.g., 1 MB each). Create a thread pool with a fixed number of threads (e.g., 4 or 8). Each thread processes a chunk, counting words separated by whitespace. Sum results using atomics.

std::atomic<long long> totalWords{0};
auto countWordsChunk = [&](const std::string& chunk) {
    int count = 0;
    bool inWord = false;
    for (char c : chunk) {
        if (std::isspace(c)) inWord = false;
        else if (!inWord) { ++count; inWord = true; }
    }
    totalWords += count;
};

Use std::async or a custom thread pool. To satisfy requirement #3, start the total word count only after all specific word count threads have finished.

4. Measuring Performance

Use std::chrono::high_resolution_clock to measure execution time and startup latency. Speedup = sequential time / parallel time. Efficiency = speedup / number of threads.

Performance Metrics Explained

Execution Time: Wall-clock time from start to finish.
Startup Latency: Time to create threads and begin processing.
Speedup: How many times faster than sequential.
Efficiency: Speedup per thread, indicating how well parallelism is utilized.

These metrics are critical in high-performance computing and cloud cost optimization.

Common Pitfalls and How to Avoid Them

Race conditions: Always protect shared data with mutexes or atomics.
False sharing: Align thread-local data to cache lines.
Overhead: Too many threads can degrade performance; match thread count to CPU cores.
Load imbalance: Ensure chunks are roughly equal size.

Real-World Applications

This design sprint mirrors techniques used in log analysis for cloud services, sentiment analysis on social media, and genomic sequence processing. For example, during the 2026 FIFA World Cup, parallel processing was used to analyze real-time match data and fan reactions. Similarly, AI chatbots like ChatGPT rely on parallel computation to generate responses quickly.

Conclusion

By completing this design sprint, you'll demonstrate a solid understanding of parallel processing concepts. Focus on design quality, not just correctness. Document your thought process, as it's a key part of the rubric. With careful planning, you'll achieve efficient parallel word counting and impress your professor.