Programming lesson
Build a Simplified Find Command (sfind) in C: A Step-by-Step Guide
Learn to implement a simplified version of the Unix find command in C. This tutorial covers directory traversal, fnmatch, getopt, and error handling, with code examples and explanations tailored for the Csci493.66 assignment.
Introduction to sfind: A Simplified Find Command
The Unix find command is a powerhouse for locating files. In this tutorial, you'll build sfind, a stripped-down version that implements two tests: matching filenames with glob patterns (-m) and checking hard links (-s). This assignment is a rite of passage for systems programming students. By the end, you'll understand directory traversal, pattern matching, and command-line parsing in C.
Why This Matters in 2026
File management remains critical even in the age of AI and cloud storage. Think of sfind as a custom search engine for your filesystem. As students juggle multiple projects and datasets, knowing how to build such tools is invaluable. Plus, the techniques you learn—recursion, stat, and getopt—are reusable in countless other applications.
Understanding the Assignment Requirements
Your program, named sfind, must:
- Accept zero or more directories and exactly one test.
- Default to the current working directory if none given.
- Print relative paths of matching files.
- Not follow symbolic links (test the link itself).
- Handle errors gracefully.
The Two Tests
-s filename: Check if the file is a hard link tofilename(same inode and filesystem).-m fileglob: Check if the file's basename matches a shell glob pattern (usefnmatch()).
This mirrors real-world scenarios: finding duplicate files or locating files by naming conventions.
Setting Up Your Project Structure
You'll write a single C source file: sfind.c. All helper functions must precede main(). Use the following includes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>
#include <fnmatch.h>
#include <libgen.h>
#include <getopt.h>Step 1: Parsing Command-Line Arguments with getopt
The getopt() function handles option parsing. Your test options (-s and -m) require an argument. Directories are non-option arguments. The tricky part: options come after directories. Use getopt() with opterr = 0 to suppress default errors, and parse all arguments yourself.
Here's a skeleton:
int main(int argc, char *argv[]) {
int opt;
char *test_type = NULL;
char *test_arg = NULL;
char *dirs[argc];
int dir_count = 0;
// First pass: collect directories (non-option args)
// Then parse options after directories
// ...
}This approach is similar to how modern AI tools parse complex commands. It's a pattern you'll see in many systems utilities.
Step 2: Traversing Directories Recursively
Write a function void search_dir(const char *path, const char *test_type, const char *test_arg). Use opendir(), readdir(), and closedir(). For each entry except . and .., construct the full path using snprintf(). Use lstat() to get file info (not stat(), because we don't follow symlinks). If it's a directory, recurse; otherwise, apply the test.
void search_dir(const char *dirpath, const char *test_type, const char *test_arg) {
DIR *dir = opendir(dirpath);
if (!dir) {
fprintf(stderr, "sfind: cannot open directory %s\n", dirpath);
return;
}
struct dirent *entry;
char fullpath[PATH_MAX];
while ((entry = readdir(dir)) != NULL) {
if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
continue;
snprintf(fullpath, sizeof(fullpath), "%s/%s", dirpath, entry->d_name);
struct stat st;
if (lstat(fullpath, &st) == -1) {
perror("lstat");
continue;
}
if (S_ISDIR(st.st_mode)) {
search_dir(fullpath, test_type, test_arg);
} else {
apply_test(fullpath, entry->d_name, test_type, test_arg, dirpath);
}
}
closedir(dir);
}This recursive traversal is like exploring a tree data structure—a concept that's also used in game development for scene graphs or AI decision trees.
Step 3: Implementing the Tests
The -s Test (Hard Link Check)
For each file, compare its inode and device with the reference file. Use stat() on the reference (since we need its real inode) and lstat() on the candidate. If both st_ino and st_dev match, it's a hard link.
int test_same_file(const char *candidate_path, const char *ref_path) {
struct stat cand_stat, ref_stat;
if (lstat(candidate_path, &cand_stat) == -1 || stat(ref_path, &ref_stat) == -1)
return 0;
return (cand_stat.st_ino == ref_stat.st_ino && cand_stat.st_dev == ref_stat.st_dev);
}The -m Test (Glob Matching)
Use fnmatch() with flag FNM_PATHNAME? No—the spec says match the filename (basename), not the path. So use fnmatch(pattern, basename, 0). Extract basename using basename() (be careful: it may modify the string).
int test_glob(const char *filename, const char *pattern) {
char *base = basename((char *)filename); // safe if we copy
return fnmatch(pattern, base, 0) == 0;
}Glob patterns are used everywhere: in shell commands, in gitignore files, and even in AI data pipelines for filtering datasets.
Step 4: Printing Relative Paths
The output must be relative to the starting directory. For example, if the starting directory is /home/user and a file is /home/user/docs/file.txt, print docs/file.txt. To compute this, pass the starting directory's path to your search function and strip that prefix from the full path. Alternatively, build paths relative to the starting directory as you recurse.
One clean method: store the starting directory length and when printing, print fullpath + start_len (plus skipping the slash).
void print_relative(const char *fullpath, const char *startdir) {
size_t len = strlen(startdir);
if (fullpath[len] == '/')
printf("%s\n", fullpath + len + 1);
else
printf("%s\n", fullpath + len);
}Step 5: Error Handling and Usage
If the user provides invalid options or no test, print an error message to stderr and exit with status 1. Example:
fprintf(stderr, "Usage: sfind [dir ...] [-s file | -m pattern]\n");
exit(1);Also handle cases where directories can't be opened or files can't be accessed—print to stderr but continue.
Complete Program Structure
Place all helper functions before main(). A typical order:
test_same_file()test_glob()apply_test()search_dir()main()
Use comments to document each function, especially if you reuse code from the class repository (cite the source).
Testing Your sfind
Create a test directory structure:
$ mkdir -p testdir/subdir
$ touch testdir/file1.txt testdir/subdir/file2.txt
$ ln testdir/file1.txt testdir/hardlink.txtRun your program:
$ ./sfind testdir -m "*.txt"
testdir/file1.txt
testdir/subdir/file2.txt
testdir/hardlink.txt
$ ./sfind testdir -s testdir/file1.txt
testdir/file1.txt
testdir/hardlink.txtCompare with the real find:
$ find testdir -samefile testdir/file1.txt # similar to -sThis kind of testing is like debugging a game level: you need to check all edge cases.
Common Pitfalls
- Not handling the default directory: If no dirs given, use
.. - Symlink handling: Use
lstat()notstat(). - Memory safety: Use
snprintf()and avoid buffer overflows. - Basename modifcation:
basename()may modify input; make a copy.
Conclusion
You've built a functional subset of find. This assignment teaches core systems programming skills: directory traversal, file metadata, pattern matching, and robust argument parsing. These concepts are foundational for building tools used in DevOps, data science pipelines, and even AI training data management. Submit your sfind.c using the provided submithwk_cs49366 command, and remember: thorough documentation and error handling will earn you top marks.