Home

Let's Build a File Watcher

Estimated Reading Time: 7.52 minutes.

So, last night I wanted a tool to execute a command when a file changed, but for lazy reasons, didn't particularly want to install the sensible solution, inotify.

But, Linux being Linux, it has all the tools to make it dead easy to implement one from scratch. In point of fact, half the time of my five minutes to implementation was spent with the man-pages, and the rest on the actual code.

There's nothing special here, and taking the principle and making it more robust is easy.

In point of fact, though my implementation will be in C, you should find equivalent ways of doing all the below in every major programming language, and should be able to port the concept across to whatever you're most familiar in. This isn't a weekend project. This is a lunch-break project.


The Code

The code for this particular blog post can be found here, under a permissive license.


The Goal

For this, my goal is to be able to run a command of the form:

./miniwatch THING_TO_WATCH -c ARGS

We have a filename to watch, and then some command and arguments to execute when that file updates its modified time.

(Incidentally, this won't work that well on folders. Their modified time updates with new/removed files, but not if any child files are modified. So, folders are "out of scope").

An example of one of the commands I wanted to run would be:

miniwatch docs/00020.texi -c bookwriter render

The Tools

There's really only two sets of tools you need to accomplish this in the most basic fashion, like we're about to:


Let's Begin!

#include <unistd.h>
#include <time.h>
#include <sys/stat.h>


int main(int argc, char* argv[]) {
    // Extract filename to watch
    char* filename = argv[1];

    // TODO: Extract commands

    // Where we'll store our stat...
    struct stat sfile;

    // We need some time values to compare
    // Init both so we don't execute on first loop, but when changes actually happen.
    time_t old_time;
    time_t current_time;
    time(&old_time);
    time(&current_time);

    while(true) {

        // Get the file's modified time field (a time_t).
        stat(filename, &sfile);
        current_time = sfile.st_mtime;

        if(difftime(old_time, current_time) < 0) {
            // Change detected!
            old_time = current_time;

            // TODO: Respond to file changes...
        }
    }
}

This is the basic loop structure that we'll be using.

There really shouldn't be anything intimidating here. We're just building some structures and comparing them.

We just need to add two small things, and we'll be done.

  1. We need to parse out and construct an array containing the command we want to run.

  2. We need to execute that array.


Rough CLI Parsing

#include <stdlib.h>
#include <string.h>

...

char** array = calloc(argc + 2, sizeof(char*));
int pos = 0;

bool found = false;

for(int i = 0; i < argc; i++) {
    if(found) {
        array[pos] = argv[i];
        pos++;
    }

    if(strcmp(argv[i], "-c") == 0) {
        found = true;
    }
}

if(found == false) {
    return 1;
}

This is really, really rough and ready, but I hate CLI parsing in C and won't spend any time thinking about it. Something like getopt is probably the right choice, but is overkill for such a small project. And in other languages you probably have better tools to use.

A few things to note:


Execution

Before we launch into the code it is helpful to know a bit more about execvp, which is the function I'm planning to use for this particular task.

Inside the exec family of functions, there are a number, with varying behaviour and tradeoffs.

I've chosen execvp, for two reasons:

However, before you're tempted to just call the function there is a quirk of the exec family you need to be aware of.

They replace the current process with the command that they're calling. Which means our loop will disappear and the program will simply exit. Which is fine for debugging, but quite useless as-is for us.

So, before executing execvp, we'll need to fork.

The parent process will containing our loop that checks the file, and the child will run the command we've assembled.

#include <sys/wait.h>

...

struct stat sfile;
time_t old_time;
time_t current_time;
time(&old_time);
time(&current_time);

pid_t pid;
int status;

while(true) {
    // Check if file/dir modified...
    stat(filename, &sfile);
    current_time = sfile.st_mtime;

    if(difftime(old_time, current_time) < 0) {
        old_time = current_time;

        if((pid = fork()) < 0) {
            // Fork fail.
            return 1;
        }
        else if(pid == 0) {
            // Run our command!
            execvp(*array, array);
        }
        else {
            // Wait until our child (execvp) terminates...
            while(wait(&status) != pid) {
                old_time = current_time;
            }
        }       
    }

    // Calling stat in an infinite loop will peg the CPU.
    sleep(1);
}

This isn't nearly as estoric as it might appear to be at first glance. It is actually pretty typical forking boilerplate.

Basically, you just need to check if you're in the fork or not, and either execute the command or wait until the command has completed.

Execution might look a little odd, execvp(*array, array);, but that's because the convention says that the first argument should be the program to run. So we're running our command the same way that your shell would, on some level. The first array argument is just a short-hand to get the first element, because that's how pointers work and this is not an article on pointers.

We do also add a sleep to our loop, because stat is a function that crosses the kernel/runtime divide, which makes it somewhat expensive to use. There's not really an alternative so it is heavily optimised, but you'll still peg the first core of a CPU to 100% if you have no sleep. A second seems reasonable, and will make our watcher disappear from your task manager.


We're done.

Seriously. That's it, we're done.

There's no libraries to link, apart from libc, so just go ahead and compile it:

cc -o miniwatch src/main.c

Wait! What about that array we pre-allocated, shouldn't we free it?

The answer to that is actually... No. We only want to free that at program exit, and the OS is far better at releasing memory than we can be, so we can leave it to Linux to clear that up. We don't want to catch exit signals and clean it ourselves. We'll be slower in every way, and more error prone.

What about all the missing error handling? Execution failure, memory allocation failure, and so on?

If you care about that, then you should probably be using inotify and not doing this yourself. But, sure, add in the half-dozen missing if statements to make it more robust. The manpages have everything you need to know.

Pretty much every function we used has a failure condition, so go ahead and wrap them.

Time is a fickle beast.

We've made a couple of assumptions that don't hold in the real world. This introduces edge-cases into our code, and if they matter to you, then working around them should be easy enough, but they don't matter for my use case.

  1. Time always moves forward. Time might, as a concept, generally move forward. Filesystems and system clocks, however, really don't. There's all sorts of ways that modified time stamps might actually move backwards, and you might want to respond to that. (Daylight savings, filesystems losing track and resetting, etc.)

  2. Time functions can fail. You'll need to be checking and responding to errno appropriately.

How robust is the CLI handling?

Not at all. There's no checking of argc, but we do reach into argv and just assume that things will work.

As it stands, a badly formed command will probably result in a segmentation fault. And make any half-decent linter scream at you.

Infinite Loop

We rely on an infinite loop to get our behaviour, because that's what makes the most sense. And most sensible compilers will actually do what we want, here.

However, until the introduction of C11, the compiler wasn't required to check if an I/O operation occurred within an infinite loop before assuming the loop would terminate, and thus eliminate it. Basically, if you're compiling under C99... Here be dragons. Thankfully, C11 fixed that gap in the specification.


Comments

Submit comment...

Subscribe to this comment thread.