CSE 482 Homework 2 Guide: Race-Free Algorithms, Parallel SSSP & Atomic Increment

Understanding Race Conditions in Parallel Algorithms

In CSE 482 BigData Analysis, one of the trickiest concepts is identifying race-free algorithms. A race condition occurs when the behavior of software depends on the timing of uncontrollable events like thread scheduling. For example, in modern AI training pipelines, multiple GPUs update shared model parameters concurrently. A race-free algorithm ensures deterministic results regardless of execution order. Among the algorithms studied, Parallel Knuth Shuffle is race-free because it uses independent random swaps without shared state conflicts. In contrast, Parallel BFS, Parallel Tree Union, and Parallel Bellman-Ford all involve shared data structures that can cause races.

Why Parallel Knuth Shuffle Stands Out

The Knuth shuffle (Fisher-Yates) can be parallelized by partitioning the array and shuffling each segment independently. Each element is swapped exactly once, and no two threads write to the same location. This is analogous to how a fantasy football draft assigns players to teams without overlap. In contrast, algorithms like Parallel Bellman-Ford rely on shared distance arrays that are read and written by multiple threads, leading to potential races unless carefully synchronized.

Parallel Single-Source Shortest Paths (SSSP)

Parallel SSSP is critical for route optimization in apps like Google Maps or Uber. The homework asks which statement about parallel SSSP is true. The correct answer is D: Parallel Bellman-Ford can have reasonably good performance on low-diameter graphs but can be expensive on large-diameter graphs. This is because Bellman-Ford iterates over all edges |V|-1 times, and on large-diameter graphs (like a long chain), the number of iterations grows linearly with diameter, hurting parallelism.

Why Delta-Stepping Has Better Span

Parallel Delta-stepping uses a bucket-based approach and achieves polylogarithmic span for graphs with bounded edge weights, making it asymptotically faster than Bellman-Ford. However, it is not work-efficient. The combination of Bellman-Ford and Dijkstra (as in the algorithm by Blelloch et al.) yields a work-efficient parallel SSSP with polylogarithmic span, but that is not listed as an option. Statement B is incorrect because the combined algorithm is work-efficient, but the phrasing is misleading. Statement C is false because Delta-stepping has better span than Bellman-Ford, but the statement says 'parallel Delta-stepping has a better span than parallel Bellman-Ford' – this is actually true, but the question asks for the 'true' statement among those listed. However, careful reading: Option C says 'Parallel ∆-stepping has a better span (asymptotically lower) than parallel Bellman-Ford.' This is true, but option D is also true and more specific to the question context. The homework likely expects D as the correct answer because it directly addresses performance characteristics discussed in class.

Quicksort and Randomized Pivots

In quicksort, if pivots are chosen uniformly at random, each element is involved in O(log n) comparisons with high probability. This statement is True. The intuition: each element participates in comparisons only when it is part of a subarray that includes the pivot. Since the pivot splits the array roughly in half each time, the depth of recursion is O(log n). This is similar to how tournament brackets in eSports eliminate players in logarithmic rounds.

Reachability-Based SCC Algorithm

The reachability-based SCC algorithm uses forward and backward BFS from chosen vertices to identify strongly connected components. In the given graph (not shown here), assume we first run reachability searches on both the blue and orange vertices in parallel. After that, we remove edges that are not part of any SCC. The number of edges removed depends on the graph structure. Typically, edges that cross between different SCCs are removed. Without the graph, we cannot compute the exact number, but the correct answer is likely 6 or 7 based on common graph examples in class.

Atomic Increment with CAS

Yihan's algorithm attempts to atomically increment a shared variable and return the new value. The code uses CAS but has a bug: it returns the value of s after the loop, but s might have been incremented by another thread between the successful CAS and the return statement. Thus, the return value may not be the value set by this thread. This is a classic ABA problem? Actually, the issue is that the return value is not the new value set by this thread but the current value of s, which could be larger if other threads incremented it. So the correct answer is C: It does not work as expected – the return value does not match the specification. This is a common pitfall in lock-free programming, similar to race conditions in concurrent AI model updates.

Key Takeaways for CSE 482

Race-free algorithms like Parallel Knuth Shuffle avoid shared state conflicts.
Parallel SSSP performance depends on graph diameter; Delta-stepping offers better span.
Randomized quicksort ensures O(log n) comparisons per element with high probability.
SCC edge removal reduces graph to its component structure.
Atomic operations require careful return value handling to avoid incorrect results.

Understanding these concepts is essential for big data analysis, where parallel efficiency and correctness are paramount. Whether you're optimizing a gaming leaderboard or training a large language model, race-free design and proper synchronization are key.