Programming lesson
Building a File-Caching Proxy with LRU Eviction: A Java Distributed Systems Tutorial
Learn how to design and implement a file-caching proxy in Java using LRU cache eviction, RMI, and concurrency control. This tutorial covers whole-file caching, open-close session semantics, and handling concurrent clients for a distributed file system.
Introduction to File-Caching Proxies in Distributed Systems
In modern distributed systems, caching is a critical technique for reducing latency and network load. This tutorial walks you through building a file-caching proxy in Java, similar to what you might encounter in a systems programming course like 15-440. The proxy sits between clients and a remote file server, caching whole files locally to speed up repeated access. We'll focus on LRU (Least Recently Used) cache eviction, concurrency control, and ensuring open-close session semantics.
Think of it like a trending AI app that caches user queries to avoid repeated API calls—except here, we're caching file reads and writes. By the end, you'll understand how to build a robust caching layer that handles multiple concurrent clients without data corruption.
Core Concepts: Whole-File Caching and Session Semantics
Unlike traditional file systems that operate on byte ranges, our proxy caches entire files. When a client opens a file, the proxy fetches the whole file from the server (if not already cached) and serves subsequent read/write operations locally. On close, the proxy writes the file back to the server if modified. This approach simplifies consistency: while a client has a file open, it sees a stable snapshot unaffected by other clients.
This is analogous to how a gaming platform caches entire game assets (levels, textures) to avoid stuttering during gameplay. The session ensures that a player's modifications (e.g., saved game state) are only pushed when they exit.
Designing the Protocol Between Proxy and Server
You are free to design your own protocol, but it must support whole-file operations. Using Java RMI, define remote interfaces like FileServer with methods: byte[] fetchFile(String path), void storeFile(String path, byte[] data), void deleteFile(String path). The proxy calls these over RMI, while the server persists files in its local storage.
For concurrent access, the server must handle multiple proxies. Use thread-safe collections and synchronized blocks to avoid race conditions. A simple approach: lock the file entry during fetch/store operations.
Implementing LRU Cache Eviction
Your proxy has a fixed cache size (in bytes). When adding a new file, if the cache is full, evict the least recently used file(s) until enough space is free. Use a LinkedHashMap with access-order=true to track LRU order, or implement a custom doubly-linked list with a HashMap for O(1) access.
Here's a basic LRU cache skeleton:
public class LRUCache {
private final long maxSize;
private long currentSize;
private final LinkedHashMap map;
public LRUCache(long maxSize) {
this.maxSize = maxSize;
this.map = new LinkedHashMap<>(16, 0.75f, true);
}
public synchronized byte[] get(String key) {
CacheEntry entry = map.get(key);
if (entry == null) return null;
entry.lastAccess = System.nanoTime();
return entry.data;
}
public synchronized void put(String key, byte[] data) {
long size = data.length;
while (currentSize + size > maxSize && !map.isEmpty()) {
evictLRU();
}
CacheEntry entry = new CacheEntry(data, System.nanoTime());
map.put(key, entry);
currentSize += size;
}
private void evictLRU() {
Map.Entry<String, CacheEntry> eldest = map.entrySet().iterator().next();
currentSize -= eldest.getValue().data.length;
map.remove(eldest.getKey());
}
}Note: In a real implementation, you'd also handle concurrent I/O and dirty flags for write-back.
Concurrency Control: Handling Multiple Clients
Clients can open the same file concurrently. Use read-write locks: multiple readers allowed, but exclusive access for writers. However, open-close semantics require that once a client opens a file, it sees a consistent version regardless of other clients' writes. One strategy: on open, the proxy creates a private copy of the cached file for that client. On close, if modified, the proxy writes back to the cache (and later to server).
For cross-proxy consistency, when a proxy fetches a file from the server, it must check if the cached version is stale. You can use version numbers or timestamps. The server returns a version with each file; the proxy compares and fetches if outdated.
Emulating C File Operations
Your proxy must emulate open, close, read, write, lseek, unlink. For each client connection, maintain a file descriptor table mapping integer FDs to file objects (containing byte array, position, mode, dirty flag). The RPCreceiver class provided handles serialization; your job is to implement the actual operations.
Example read implementation:
public int read(int fd, byte[] buf, int count) {
FileObject file = fdTable.get(fd);
if (file == null) return -1;
int available = file.data.length - file.position;
int toRead = Math.min(count, available);
System.arraycopy(file.data, file.position, buf, 0, toRead);
file.position += toRead;
return toRead;
}Putting It All Together: Proxy Architecture
Your proxy starts with command-line arguments: server IP, server port (RMI registry port), cache directory, and cache size. On startup, it connects to the server via RMI, initializes the LRU cache, and starts listening for client connections (via the provided RPCreceiver). Each client request (open, read, etc.) is handled by a thread pool. The proxy fetches files from the server on cache miss, serves from cache on hit, and writes back on close or eviction.
For checkpoint 1, you may run without a server—just serve files from a local directory. This tests the client-facing operations.
Testing and Debugging Tips
Use the provided binary tools (e.g., 440read) to test your proxy. Run multiple concurrent clients to verify consistency. Log cache hits/misses and evictions. Use jstack to detect deadlocks. Remember to secure your AFS space.
Conclusion
Building a file-caching proxy teaches you essential distributed systems skills: caching protocols, RMI, concurrency, and LRU eviction. This design is applicable to modern cloud storage, CDNs, and even AI model caching. By following this tutorial, you'll have a solid foundation for your assignment and beyond.