Programming lesson
Two-Phase Commit for Group Photo Collages: A Distributed Systems Tutorial
Learn how to implement a two-phase commit protocol for a distributed group photo collage system, handling failures and concurrency. This tutorial covers Server and UserNode coordination, message loss, node crashes, and logging for recovery.
Introduction: The Solvay Conference to Snapchat Collages
In 1927, the Solvay Conference group photo captured Einstein, Curie, and other physics giants on a single plate. Today, group photos are digital collages assembled from multiple smartphones. This shift mirrors the evolution from centralized to distributed systems. In this tutorial, you will build a two-phase commit (2PC) protocol for a distributed collage system, similar to how modern apps like Instagram or Snapchat coordinate shared media. You'll learn to handle lost messages, node crashes, and concurrent commits—skills essential for backend engineers working on distributed databases or microservices.
Understanding the Two-Phase Commit Protocol
Two-phase commit ensures that all participants in a distributed transaction agree on the outcome: commit or abort. It consists of two phases:
- Phase 1 (Voting): The coordinator sends a "prepare" request to all participants. Each participant votes yes or no.
- Phase 2 (Decision): If all vote yes, the coordinator sends "commit"; otherwise, "abort". Participants then act accordingly.
In our collage system, the Server acts as coordinator, and UserNodes are participants. Each UserNode that contributed an image must approve the collage. If any rejects, the collage is discarded. This ensures consistency: a published collage contains only images from users who are happy with the final composition.
System Architecture: Server and UserNodes
Your implementation consists of two main classes: Server and UserNode. The Project4 class launches them as separate processes. The Server listens for candidate collages via the CommitServing interface. When a new collage is posted, the Server initiates a 2PC transaction.
Each UserNode communicates with the Server using ProjectLib's datagram messaging. You must not use sockets or RMI directly. Messages can be lost or delayed; assume delivery within 3 seconds if not lost. Handle timeouts and retransmissions.
Implementing the Server
The Server must maintain state for each ongoing commit. Since multiple collages can be processed concurrently, use a thread-safe data structure to track transactions. For each transaction, store:
- Collage filename and contents
- List of source images (address:filename)
- Current phase (init, voting, commit/abort)
- Votes received
- Timeout handler
When startCommit is called, the Server:
- Creates a new transaction with a unique ID.
- Sends a PREPARE message to all involved UserNodes.
- Starts a timeout (e.g., 5 seconds) to handle lost messages.
- Collects votes. If all yes, sends COMMIT; otherwise, sends ABORT.
- On commit, writes the collage file to the working directory.
If a UserNode does not respond within the timeout, resend the PREPARE. After a maximum number of retries, assume the node is dead and abort.
Implementing the UserNode
Each UserNode must track which images it owns and whether they have been used in a committed collage. When a PREPARE arrives:
- Call
askUserto get user approval. - If approved, vote YES and mark the images as "pending commit".
- If not approved, vote NO.
- If a COMMIT arrives later, remove the source images from the working directory.
- If an ABORT arrives, release the pending images.
UserNodes must also handle duplicate messages (e.g., if a PREPARE is resent). Idempotency is key: if already voted, resend the same vote.
Handling Failures: Lost Messages and Crashes
Lost messages are handled by timeouts and retransmission. For node crashes, use logging to persistent storage. Before sending a vote or commit decision, write the state to a log file. On recovery, read the log to resume the transaction.
For example, if a UserNode crashes after voting YES but before receiving the decision, on restart it should check the log. If it had voted YES, it must wait for the coordinator's decision. The coordinator might have committed or aborted; the UserNode can query the Server or rely on the log to retransmit its vote.
The Server should also log each phase. If the Server crashes after sending COMMIT but before all UserNodes receive it, on recovery it can resend COMMIT to those who haven't acknowledged.
Concurrency: Multiple Collages at Once
Concurrent commits are common in real systems. Your Server must handle multiple startCommit calls simultaneously. Use separate threads or an event loop. Each transaction is independent, but UserNodes may be involved in multiple transactions. A UserNode can vote on several collages concurrently, but each image can only appear in one committed collage. Use locks or atomic operations to prevent double-use.
Trend Connection: AI-Powered Photo Collages
Today, AI tools like Google Photos or Adobe Photoshop automatically create collages from your camera roll. These systems use distributed coordination behind the scenes. Imagine an AI app that lets friends contribute selfies to a group collage; the app must ensure everyone agrees before publishing. This is exactly what your 2PC implementation does. Understanding this protocol prepares you for building reliable, distributed AI services.
Testing Your Implementation
Test cases will verify that the correct collages appear in the Server's working directory and that source images are removed from UserNodes. Write unit tests for:
- Happy path: all approve, collage published.
- One reject: collage aborted.
- Lost messages: ensure retransmission works.
- Node crash: restart and recover.
- Concurrent commits: multiple collages processed without interference.
Conclusion
Two-phase commit is a fundamental distributed systems concept. By implementing it for a group photo collage, you've learned to handle consistency, failures, and concurrency. These skills are directly applicable to distributed databases (e.g., Spanner), blockchain consensus, and cloud-native applications. Now go build your collage system and make Einstein proud!