Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Building a Transparent RPC File System: A Step-by-Step C Tutorial for Remote File Operations

Learn how to implement a remote procedure call (RPC) system in C for transparent remote file operations. This tutorial covers server-client architecture, marshalling, interposition, and handling concurrent clients—perfect for distributed systems projects.

transparent remote file operations RPC tutorial C remote procedure call example distributed systems project C programming interposition LD_PRELOAD stub library marshalling serialization protocol concurrent TCP server C file operations RPC getdirtree implementation client stub library 15.094 project 1 remote file access system C socket programming Autolab grading cloud storage analogy

Introduction to Transparent Remote File Operations

In distributed computing, accessing files on a remote machine should feel as seamless as accessing local files. This is the core idea behind transparent remote file operations—a key concept in systems programming. In this tutorial, we'll build an RPC-based system that allows a client to perform file operations (open, read, write, lseek, stat, unlink, getdirentries, getdirtree) on a remote server, just as if they were local. This is exactly what you need for assignments like 15.094 project 1, where you implement a server and a client stub library for remote file access.

We'll focus on the essential components: interposition, marshalling, networking, and concurrent client handling. By the end, you'll understand how to design a protocol, serialize complex data structures, and make your RPC layer robust. Let's dive in with a timely analogy: think of your file server as a cloud storage service like Google Drive—your local app makes API calls that are handled remotely, but you never see the network complexity.

Understanding the Project Requirements

Your task is to create two main components:

  • Server process: Listens for TCP connections and performs actual file operations on behalf of clients.
  • Client stub library: Intercepts standard C library calls (open, close, read, write, lseek, stat, unlink, getdirentries) and converts them into RPC messages sent to the server.

Additionally, you must handle two non-standard calls: getdirtree and freedirtree, which are provided as a shared library. Your system must support multiple concurrent clients and handle multiple open files per client. Error reporting (e.g., file not found) must be accurate.

For Checkpoint 1, you start by interposing all required functions and logging them to a remote server. No actual RPCs yet—just print function names. This validates your interposition setup.

Step 1: Setting Up Interposition with LD_PRELOAD

Interposition allows your library to override standard C functions. Using LD_PRELOAD, you can load your stub library before libc. Here's a minimal example for open:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdarg.h>

int open(const char *pathname, int flags, ...) {
    static int (*real_open)(const char *, int, ...) = NULL;
    if (!real_open) {
        real_open = dlsym(RTLD_NEXT, "open");
    }
    // Log to server (Checkpoint 1)
    // For now, just call real function
    va_list ap;
    va_start(ap, flags);
    mode_t mode = va_arg(ap, mode_t);
    va_end(ap);
    return real_open(pathname, flags, mode);
}

You'll need to implement similar wrappers for close, read, write, lseek, stat, unlink, getdirentries, getdirtree, and freedirtree. For getdirtree, you'll call the provided library function (via dlopen or linking).

Step 2: Designing the RPC Protocol

Your protocol defines how messages are serialized over TCP. Keep it simple: use a fixed-length header followed by variable-length payload. Example header:

struct rpc_header {
    uint32_t function_id;   // e.g., 1=open, 2=close, ...
    uint32_t payload_len;   // length of payload in bytes
    uint32_t request_id;    // to match responses
};

For marshalling, convert function arguments into a byte stream. For example, for open: send path string length + path string + flags + mode. For read: send fd + count. For stat: send path string and expect a struct stat in return.

Unmarshalling on the server rebuilds arguments, calls the real function, marshals return values (including errno), and sends back.

Step 3: Implementing the Server

Your server should listen on a TCP port (e.g., 15440). Use fork() or threads to handle concurrent clients. For each connection, read the header, then the payload, dispatch to the appropriate handler, and send the response.

Example handler for open:

void handle_open(int client_fd, char *payload) {
    // Unmarshall path, flags, mode
    char *path = ...;
    int flags = ...;
    mode_t mode = ...;
    int fd = real_open(path, flags, mode);
    // Marshall fd and errno, send back
}

Remember to handle errors: if real_open returns -1, send errno so the client can set it appropriately.

Step 4: Client Stub Implementation

In your interposition functions, instead of calling the real function locally, you'll send an RPC and wait for the response. Example for open:

int open(const char *pathname, int flags, ...) {
    // Get mode if flags include O_CREAT
    va_list ap;
    va_start(ap, flags);
    mode_t mode = (flags & O_CREAT) ? va_arg(ap, mode_t) : 0;
    va_end(ap);

    // Marshall request
    char *buf = malloc(HEADER_SIZE + ...);
    struct rpc_header *hdr = (struct rpc_header *)buf;
    hdr->function_id = 1;
    // ... fill payload
    send(server_fd, buf, total_len, 0);
    // Receive response
    recv(server_fd, buf, HEADER_SIZE, MSG_WAITALL);
    // Unmarshall fd and errno
    return fd;
}

For file descriptors, you need a mapping between local fd (returned to the caller) and remote fd (used by server). Maintain a table: local_fd -> remote_fd. The server's fd is only known to the client stub.

Step 5: Handling Concurrent Clients and Multiple Files

Your server must handle multiple clients simultaneously. Using fork() per connection is straightforward: each child process has its own file descriptor table. For threads, ensure thread-safe access to shared resources (like a global mapping).

On the client side, multiple threads might call file operations concurrently. Use mutexes around socket sends/receives to avoid interleaving.

Step 6: Testing with Provided Tools

Use the provided tools (440cat, 440ls, 440read, 440write, 440tree) to test your system. Run your server in the background, then set LD_PRELOAD and execute a command:

export LD_PRELOAD=./mylib.so
./tools/440cat remote_test.txt

If everything works, it should print the contents of the file on the server. For debugging, use stderr for logs so they don't interfere with stdout.

Marshalling Complex Data: struct stat and getdirtree

Serializing struct stat requires care: it's platform-dependent. Use fixed-width types (e.g., int64_t) to send fields like st_size, st_mode, etc. For getdirtree, which returns a tree of dirent-like structures, define a simple linked list or array format.

Example serialization for getdirtree response: send number of entries, then for each entry: name length + name + type (file/dir). The client reconstructs the tree using malloc.

Error Handling and Reporting

When a server operation fails, send back the errno value. The client stub should set errno accordingly before returning -1. For functions like read, also handle partial reads by looping until all requested bytes are received.

Design Document Tips

Your 1-page design document should describe your serialization protocol and key decisions. Mention why you chose a particular header format, how you handle concurrent clients, and any trade-offs (e.g., simplicity vs. performance).

Conclusion

Building a transparent RPC file system is a classic distributed systems exercise. By following this tutorial, you'll gain hands-on experience with interposition, socket programming, marshalling, and concurrent server design. These skills are directly applicable to modern cloud storage and microservices architectures. Remember to test incrementally and use the autograder's feedback. Good luck with your project!