Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Build a Simplified Find Command (sfind) in C: A Step-by-Step Guide

Learn to implement a simplified version of the Unix find command in C. This tutorial covers directory traversal, fnmatch, getopt, and error handling, with code examples and explanations tailored for the Csci493.66 assignment.

sfind C tutorial simplified find command Csci493.66 assignment 4 C programming file search fnmatch example C getopt parsing C directory traversal C hard link detection glob pattern matching Unix find implementation systems programming assignment C recursion filesystem file metadata C error handling C student coding project C language 2026

Introduction to sfind: A Simplified Find Command

The Unix find command is a powerhouse for locating files. In this tutorial, you'll build sfind, a stripped-down version that implements two tests: matching filenames with glob patterns (-m) and checking hard links (-s). This assignment is a rite of passage for systems programming students. By the end, you'll understand directory traversal, pattern matching, and command-line parsing in C.

Why This Matters in 2026

File management remains critical even in the age of AI and cloud storage. Think of sfind as a custom search engine for your filesystem. As students juggle multiple projects and datasets, knowing how to build such tools is invaluable. Plus, the techniques you learn—recursion, stat, and getopt—are reusable in countless other applications.

Understanding the Assignment Requirements

Your program, named sfind, must:

  • Accept zero or more directories and exactly one test.
  • Default to the current working directory if none given.
  • Print relative paths of matching files.
  • Not follow symbolic links (test the link itself).
  • Handle errors gracefully.

The Two Tests

  1. -s filename: Check if the file is a hard link to filename (same inode and filesystem).
  2. -m fileglob: Check if the file's basename matches a shell glob pattern (use fnmatch()).

This mirrors real-world scenarios: finding duplicate files or locating files by naming conventions.

Setting Up Your Project Structure

You'll write a single C source file: sfind.c. All helper functions must precede main(). Use the following includes:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>
#include <fnmatch.h>
#include <libgen.h>
#include <getopt.h>

Step 1: Parsing Command-Line Arguments with getopt

The getopt() function handles option parsing. Your test options (-s and -m) require an argument. Directories are non-option arguments. The tricky part: options come after directories. Use getopt() with opterr = 0 to suppress default errors, and parse all arguments yourself.

Here's a skeleton:

int main(int argc, char *argv[]) {
    int opt;
    char *test_type = NULL;
    char *test_arg = NULL;
    char *dirs[argc];
    int dir_count = 0;
    
    // First pass: collect directories (non-option args)
    // Then parse options after directories
    // ...
}

This approach is similar to how modern AI tools parse complex commands. It's a pattern you'll see in many systems utilities.

Step 2: Traversing Directories Recursively

Write a function void search_dir(const char *path, const char *test_type, const char *test_arg). Use opendir(), readdir(), and closedir(). For each entry except . and .., construct the full path using snprintf(). Use lstat() to get file info (not stat(), because we don't follow symlinks). If it's a directory, recurse; otherwise, apply the test.

void search_dir(const char *dirpath, const char *test_type, const char *test_arg) {
    DIR *dir = opendir(dirpath);
    if (!dir) {
        fprintf(stderr, "sfind: cannot open directory %s\n", dirpath);
        return;
    }
    struct dirent *entry;
    char fullpath[PATH_MAX];
    while ((entry = readdir(dir)) != NULL) {
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
            continue;
        snprintf(fullpath, sizeof(fullpath), "%s/%s", dirpath, entry->d_name);
        struct stat st;
        if (lstat(fullpath, &st) == -1) {
            perror("lstat");
            continue;
        }
        if (S_ISDIR(st.st_mode)) {
            search_dir(fullpath, test_type, test_arg);
        } else {
            apply_test(fullpath, entry->d_name, test_type, test_arg, dirpath);
        }
    }
    closedir(dir);
}

This recursive traversal is like exploring a tree data structure—a concept that's also used in game development for scene graphs or AI decision trees.

Step 3: Implementing the Tests

The -s Test (Hard Link Check)

For each file, compare its inode and device with the reference file. Use stat() on the reference (since we need its real inode) and lstat() on the candidate. If both st_ino and st_dev match, it's a hard link.

int test_same_file(const char *candidate_path, const char *ref_path) {
    struct stat cand_stat, ref_stat;
    if (lstat(candidate_path, &cand_stat) == -1 || stat(ref_path, &ref_stat) == -1)
        return 0;
    return (cand_stat.st_ino == ref_stat.st_ino && cand_stat.st_dev == ref_stat.st_dev);
}

The -m Test (Glob Matching)

Use fnmatch() with flag FNM_PATHNAME? No—the spec says match the filename (basename), not the path. So use fnmatch(pattern, basename, 0). Extract basename using basename() (be careful: it may modify the string).

int test_glob(const char *filename, const char *pattern) {
    char *base = basename((char *)filename); // safe if we copy
    return fnmatch(pattern, base, 0) == 0;
}

Glob patterns are used everywhere: in shell commands, in gitignore files, and even in AI data pipelines for filtering datasets.

Step 4: Printing Relative Paths

The output must be relative to the starting directory. For example, if the starting directory is /home/user and a file is /home/user/docs/file.txt, print docs/file.txt. To compute this, pass the starting directory's path to your search function and strip that prefix from the full path. Alternatively, build paths relative to the starting directory as you recurse.

One clean method: store the starting directory length and when printing, print fullpath + start_len (plus skipping the slash).

void print_relative(const char *fullpath, const char *startdir) {
    size_t len = strlen(startdir);
    if (fullpath[len] == '/')
        printf("%s\n", fullpath + len + 1);
    else
        printf("%s\n", fullpath + len);
}

Step 5: Error Handling and Usage

If the user provides invalid options or no test, print an error message to stderr and exit with status 1. Example:

fprintf(stderr, "Usage: sfind [dir ...] [-s file | -m pattern]\n");
exit(1);

Also handle cases where directories can't be opened or files can't be accessed—print to stderr but continue.

Complete Program Structure

Place all helper functions before main(). A typical order:

  1. test_same_file()
  2. test_glob()
  3. apply_test()
  4. search_dir()
  5. main()

Use comments to document each function, especially if you reuse code from the class repository (cite the source).

Testing Your sfind

Create a test directory structure:

$ mkdir -p testdir/subdir
$ touch testdir/file1.txt testdir/subdir/file2.txt
$ ln testdir/file1.txt testdir/hardlink.txt

Run your program:

$ ./sfind testdir -m "*.txt"
testdir/file1.txt
testdir/subdir/file2.txt
testdir/hardlink.txt
$ ./sfind testdir -s testdir/file1.txt
testdir/file1.txt
testdir/hardlink.txt

Compare with the real find:

$ find testdir -samefile testdir/file1.txt  # similar to -s

This kind of testing is like debugging a game level: you need to check all edge cases.

Common Pitfalls

  • Not handling the default directory: If no dirs given, use ..
  • Symlink handling: Use lstat() not stat().
  • Memory safety: Use snprintf() and avoid buffer overflows.
  • Basename modifcation: basename() may modify input; make a copy.

Conclusion

You've built a functional subset of find. This assignment teaches core systems programming skills: directory traversal, file metadata, pattern matching, and robust argument parsing. These concepts are foundational for building tools used in DevOps, data science pipelines, and even AI training data management. Submit your sfind.c using the provided submithwk_cs49366 command, and remember: thorough documentation and error handling will earn you top marks.