Distributed Systems Problem Set 1: Serialization, Network Delay, and Protocols Tutorial

Understanding Serialization in Distributed Systems

In distributed systems, data must be transmitted between machines that may have different architectures. Serialization—converting data structures into a byte stream—is a fundamental operation. The problem set excerpt shows a serialize function for a game state. Let's break down the key components.

Memory Layout and Size Calculation

On a 64-bit system, pointers are 8 bytes, int32_t is 4 bytes, and int8_t is 1 byte. The game_t struct contains a pointer (squareval) and an integer (round). However, the board data is dynamically allocated. To serialize, we need to include the board's raw data. The total size includes the board (100*100 bytes) plus the round number (4 bytes). The missing code fragments are:

_(1)_: sizeof(int8_t) * 100 * 100 + sizeof(int32_t) or 10000 + 4. Explanation: we need space for the board and the round.
_(2)_: &total_size or sizeof(size_t)? Actually, memcpy(data_packet, total_size, ...) is incorrect; likely they meant to copy the total_size value into the packet. Probably memcpy(data_packet, &total_size, sizeof(size_t)) to store the size first. But the pseudocode says memcpy(data_packet, total_size, _(2)_); the second argument should be a pointer. So _(2)_ should be sizeof(size_t) and the first argument should be &total_size. However, the snippet shows memcpy(data_packet, total_size, _(2)_); it's ambiguous. In typical serialization, you'd store the total size first. Let's assume _(2)_ is sizeof(size_t) and the line is corrected to memcpy(data_packet, &total_size, sizeof(size_t)).
_(3)_: sizeof(size_t). The offset after storing the total size.
_(4)_: sizeof(int8_t) * 100 = 100 bytes per row.
_(5)_: currentstate->squareval + i * 100 or &(currentstate->squareval[i*100]). The pointer to the start of row i.

Calculating Network Delays

Part B asks for total delay from move completion to opponent seeing the move. This includes marshalling (5 ms), one-way network latency (10 ms), transmission time (data size / bandwidth), and unmarshalling (5 ms). The board size is 10000 bytes = 80,000 bits. Bandwidth is 240 kbps = 240,000 bps. Transmission time = 80,000 / 240,000 = 0.3333 s = 333.33 ms. Total delay = 5 + 10 + 333.33 + 10 + 5 = 363.33 ms. Note: marshalling/unmarshalling times are on each side, so add both.

Efficient Move Transmission

Instead of sending the entire board, send only the changed squares. For up to 10 changed squares, we can encode each change as (row, col, new_value). A compact structure: typedef struct { int16_t row; int16_t col; int8_t new_val; } MoveChange;. Then a move packet contains a count (1 byte) followed by the changes. Average size per move: 1 + 10 * (2+2+1) = 1 + 50 = 51 bytes. For original encoding, average size is still 10000 bytes. Delay with original: 5 + 10 + (80000/240000)*1000 + 10 + 5 = 363.33 ms. With efficient: 5 + 10 + (51*8/240) + 10 + 5 = 5+10+1.7+10+5 = 31.7 ms. Much faster!

When Many Squares Change

If average changes per move is 5000, the efficient encoding size = 1 + 5000 * 5 = 25001 bytes. Original still 10000 bytes. Now original is smaller! So the best approach depends on the update pattern. Insight: use delta encoding when changes are few; otherwise, full state may be better.

Endianness and Hardware Upgrades

Part G highlights a classic issue: different CPU architectures use different byte orders (endianness). Intel uses little-endian; some non-Intel chips (e.g., PowerPC) use big-endian. If the serialized data is sent without converting to a common format (e.g., network byte order), the integers will be interpreted incorrectly. Fix: use functions like htonl/ntohl to convert to network byte order (big-endian) before transmission.

Protocol Choice: Stop-and-Go vs. Blast

Stop-and-go sends one recipe at a time, waiting for ACK. Blast sends all at once. On a high-latency link (Mars, L seconds), blast is better because it uses the bandwidth more efficiently—only one round-trip delay for the entire batch. On a low-latency link with high packet loss, stop-and-go may be better because retransmitting all recipes on a single loss is wasteful.

End-to-End Delay Pipeline

For the ChatGPT query pipeline, total delay = sum of latencies + transmission times + processing. For a 10 KB query and 5 KB response: transmission times: Wi-Fi 10KB/10Mbps = 8ms, Ethernet 10KB/100Mbps = 0.8ms, WAN 10KB/50Mbps = 1.6ms; similarly for response 5KB. Total = (10+1+500)*2 for latencies? Actually one-way: query goes through three links, response comes back. So total delay = (10+1+500) ms latency each way + transmission times + server processing 200 ms. Compute carefully.

Real-World Connections

These concepts are crucial for modern multiplayer games, cloud gaming (like GeForce Now), and real-time AI inference pipelines. Understanding serialization and protocol design helps build efficient distributed systems.