Optimizing Data Serialization and Network Protocols in Distributed Systems

Introduction: Why Distributed Systems Need Efficient Data Handling

In distributed systems, the way you package and transmit data can make or break performance. Whether you're building a multiplayer game, a recipe-sharing app, or a cloud-based AI service, understanding serialization, network protocols, and latency trade-offs is crucial. This tutorial walks through key concepts using a problem set from a distributed systems course (Spring 2025), connecting them to modern trends like cloud gaming, AI assistants, and Mars communication delays. By the end, you'll know how to choose the right serialization format and protocol for your system.

Part 1: Data Marshalling – Packing Your Game Board Efficiently

Understanding the Serialization Code

The provided pseudocode marshals a game state containing a 100x100 board (int8_t per square) and a round number (int32_t). On a 64-bit system, we need to compute sizes and offsets carefully.

Missing Code Fragments:

_(1)_: sizeof(size_t) + 100*100*sizeof(int8_t) + sizeof(int32_t) – Total size includes the size field itself (to indicate packet length), the board data (10,000 bytes), and the round number (4 bytes).
_(2)_: &total_size, sizeof(size_t) – Copy the total size as the first field for network transmission.
_(3)_: sizeof(size_t) – Offset after the size field, where board data begins.
_(4)_: 100 * sizeof(int8_t) – Each row has 100 bytes.
_(5)_: currentstate->squareval + i*100 – Pointer to the start of row i.

This serialization assumes the board is stored as a flat array of 10,000 bytes; copying row by row is unnecessary but works.

End-to-End Delay Calculation

Given: marshalling/unmarshalling = 5 ms each (client and server), one-way latency = 10 ms, bandwidth = 240 kbps = 30 KB/s. Data size = 4 (size) + 10,000 (board) + 4 (round) = 10,008 bytes ≈ 10 KB. Transmission time = 10 KB / 30 KB/s ≈ 0.333 s = 333 ms. Total delay = client marshalling (5) + transmission (333) + server unmarshalling (5) + one-way latency (10) = 353 ms. But note: latency is one-way; the round trip includes both directions? The question asks from Michael's move to Jimmy's start of thinking – that's one-way from client to server, so we add only one latency. So total = 5+333+5+10 = 353 ms.

Optimizing for Moves That Change Few Squares

If each move modifies at most 10 squares, sending the entire board (10 KB) is wasteful. Instead, send only the changed squares. Data structure: a header with number of changed squares (e.g., uint8_t), then for each square: row (uint8_t), column (uint8_t), new value (int8_t). That's 1 + 10*(1+1+1) = 31 bytes per move. Much smaller!

Average Data Size and Delay

Original encoding: always 10,008 bytes. Transmission time = 333 ms. Total delay = 5+333+5+10 = 353 ms. Efficient encoding: average 5 changed squares => 1 + 5*3 = 16 bytes. Transmission time = 16 B / 30 KB/s ≈ 0.00053 s = 0.53 ms. Total delay = 5+0.53+5+10 = 20.53 ms. Huge improvement.

When Many Squares Change

If average changes = 5000, efficient encoding size = 1 + 5000*3 = 15,001 bytes ≈ 15 KB, which is larger than original 10 KB. So original is better. Insight: choose serialization based on expected amount of change. Adaptive approaches could switch between full state and delta encoding.

Endianness Issue After Hardware Upgrade

Intel uses little-endian; non-Intel chips (e.g., ARM) may use big-endian. The serialized round number (int32_t) is stored in little-endian; if the new machine interprets it as big-endian, the value will be wrong, corrupting game state. Fix: use network byte order (big-endian) via htonl/ntohl for multi-byte fields, or use a portable serialization library like Protocol Buffers.

Part 2: Network Protocols – Stop-and-Go vs. Blast for Boba Recipes

Scenario A: Cafe (Low Latency, Moderate Bandwidth)

One-way latency = 10 ms, bandwidth = B bps. Stop-and-go: for N recipes, each recipe requires a round trip (2*latency + transmission + ACK). Blast: send all at once, wait for one ACK. If N is large, blast is better because it avoids per-recipe round trips. But if packet loss is high, stop-and-go may be more reliable (only retransmit lost ones).

Scenario B: Mars (High Latency)

Latency L seconds (e.g., 4-24 minutes). Blast is far better because it avoids many round trips. Stop-and-go would be disastrous.

Scenario C: High Packet Loss

With high loss, stop-and-go becomes inefficient due to many retransmissions. Blast may also suffer because losing the single ACK forces retransmission of all recipes. A better approach: use a sliding window protocol like TCP, or send recipes in batches with cumulative ACKs.

Part 3: End-to-End Pipeline for ChatGPT Queries

Pipeline Components

Wi-Fi: latency 10 ms, bandwidth 10 Mbps
Ethernet to cloudlet: latency 1 ms, bandwidth 100 Mbps
Cloudlet to ChatGPT: latency 500 ms, bandwidth 50 Mbps
ChatGPT processing: 200 ms

Query and Response Delay

Query 10 KB, response 5 KB. Transmission times: Wi-Fi: 10 KB / 10 Mbps = 8 ms; Ethernet: 10 KB / 100 Mbps = 0.8 ms; long link: 10 KB / 50 Mbps = 1.6 ms. Total transmission = 8+0.8+1.6 = 10.4 ms. Latency sum = 10+1+500 = 511 ms. Processing = 200 ms. So total one-way delay = 10.4 + 511 + 200 = 721.4 ms. For response, similar calculation: 5 KB transmission times: Wi-Fi: 4 ms, Ethernet: 0.4 ms, long link: 0.8 ms => 5.2 ms. Same latency 511 ms. Total round trip = query delay + response delay = 721.4 + (5.2+511) = 1237.6 ms. But the question likely asks for total end-to-end delay for the query to get response? Clarify: from sending query to receiving response = query transmission + latencies + processing + response transmission + latencies = (10.4+511) + 200 + (5.2+511) = 1237.6 ms.

File Upload Case

30 MB file, 1000 packets of 30 KB each. Transmit over Wi-Fi: 30 MB / 10 Mbps = 24 s; Ethernet: 30 MB / 100 Mbps = 2.4 s; long link: 30 MB / 50 Mbps = 4.8 s. Total transmission = 31.2 s. Latency sum = 511 ms. Then cloudlet forwards to ChatGPT: same transmission again? Actually after receiving entire file, cloudlet transmits to ChatGPT over the same long link? That would double the long link transmission. But the question says cloudlet transmits to ChatGPT as a back-to-back stream. So total delay = initial transmission (31.2 s) + latencies (0.511 s) + cloudlet processing (assume minimal) + second transmission (31.2 s) + latencies (0.511 s) + ChatGPT processing (200 ms) = 63.4 s. This is slow; optimizing with pipelining or compression would help.

Conclusion: Key Takeaways for Efficient Distributed Systems

Efficient data serialization and protocol choice depend on context. Use delta encoding when changes are small, consider endianness, choose blast for high-latency links, and break large transfers into smaller chunks to avoid head-of-line blocking. These principles apply to modern apps like cloud gaming, AI inference, and distributed databases.