Estimated Reading Time: 7.52 minutes.
So, last night I wanted a tool to execute a command when a file changed, but for lazy reasons, didn't particularly want to install the sensible solution, inotify.
But, Linux being Linux, it has all the tools to make it dead easy to implement one from scratch. In point of fact, half the time of my five minutes to implementation was spent with the man-pages, and the rest on the actual code.
There's nothing special here, and taking the principle and making it more robust is easy.
In point of fact, though my implementation will be in C, you should find equivalent ways of doing all the below in every major programming language, and should be able to port the concept across to whatever you're most familiar in. This isn't a weekend project. This is a lunch-break project.
The Code
The code for this particular blog post can be found here, under a permissive license.
The Goal
For this, my goal is to be able to run a command of the form:
./miniwatch THING_TO_WATCH -c ARGS
We have a filename to watch, and then some command and arguments to execute when that file updates its modified time.
(Incidentally, this won't work that well on folders. Their modified time updates with new/removed files, but not if any child files are modified. So, folders are "out of scope").
An example of one of the commands I wanted to run would be:
miniwatch docs/00020.texi -c bookwriter render
The Tools
There's really only two sets of tools you need to accomplish this in the most basic fashion, like we're about to:
stat - This is the call you want to use to check file metadata.
- And well documented under
man 3 stat
- And well documented under
execvp - This is the call we'll use to execute the given command.
- It is somewhat documented under
man 3 exec
, but you'll need a little extra experience to know how to use it correctly.
- It is somewhat documented under
Let's Begin!
#include <unistd.h>
#include <time.h>
#include <sys/stat.h>
int main(int argc, char* argv[]) {
// Extract filename to watch
char* filename = argv[1];
// TODO: Extract commands
// Where we'll store our stat...
struct stat sfile;
// We need some time values to compare
// Init both so we don't execute on first loop, but when changes actually happen.
time_t old_time;
time_t current_time;
time(&old_time);
time(¤t_time);
while(true) {
// Get the file's modified time field (a time_t).
stat(filename, &sfile);
current_time = sfile.st_mtime;
if(difftime(old_time, current_time) < 0) {
// Change detected!
old_time = current_time;
// TODO: Respond to file changes...
}
}
}
This is the basic loop structure that we'll be using.
There really shouldn't be anything intimidating here. We're just building some structures and comparing them.
time
creates a time_t from a given pointer, returning the number of seconds since Epoch. (man 2 time
).- Technically we should be checking errno as well, but this tool is rough.
stat
creates astruct stat
, from a filepath and a pointer where to store it. (man 3 stat
).- The
st_mtime
attribute of the struct contains our file modified time.
- The
difftime
returns adouble
allowing us to compare the number of seconds between twotime_t
values.
We just need to add two small things, and we'll be done.
We need to parse out and construct an array containing the command we want to run.
We need to execute that array.
Rough CLI Parsing
#include <stdlib.h>
#include <string.h>
...
char** array = calloc(argc + 2, sizeof(char*));
int pos = 0;
bool found = false;
for(int i = 0; i < argc; i++) {
if(found) {
array[pos] = argv[i];
pos++;
}
if(strcmp(argv[i], "-c") == 0) {
found = true;
}
}
if(found == false) {
return 1;
}
This is really, really rough and ready, but I hate CLI parsing in C and won't spend any time thinking about it. Something like getopt
is probably the right choice, but is overkill for such a small project. And in other languages you probably have better tools to use.
A few things to note:
We use
calloc
to create our array, because we want the array of strings to beNULL
-terminated. Conveniently,calloc
0-fills the memory it claims. (We are ignoring all error handling, but that wouldn't be too difficult to toss together).We're not copying the strings stored in
argv
. We don't need to, so why bother with extra memory management? The OS has already done it for us. If you're dealing with something other thanargv
for some reason... You may need to allocate and copy the individual strings.The strings in
argv
are already well formed 0-terminated strings. Which means we can usestrcmp
without much of a care, but usually you will want to use a different comparison function from that family.
Execution
Before we launch into the code it is helpful to know a bit more about execvp
, which is the function I'm planning to use for this particular task.
Inside the exec
family of functions, there are a number, with varying behaviour and tradeoffs.
I've chosen execvp
, for two reasons:
It lets me pass an array of arguments to execute.
It is aware of
$PATH
, and will follow it.
However, before you're tempted to just call the function there is a quirk of the exec
family you need to be aware of.
They replace the current process with the command that they're calling. Which means our loop will disappear and the program will simply exit. Which is fine for debugging, but quite useless as-is for us.
So, before executing execvp
, we'll need to fork
.
The parent process will containing our loop that checks the file, and the child will run the command we've assembled.
#include <sys/wait.h>
...
struct stat sfile;
time_t old_time;
time_t current_time;
time(&old_time);
time(¤t_time);
pid_t pid;
int status;
while(true) {
// Check if file/dir modified...
stat(filename, &sfile);
current_time = sfile.st_mtime;
if(difftime(old_time, current_time) < 0) {
old_time = current_time;
if((pid = fork()) < 0) {
// Fork fail.
return 1;
}
else if(pid == 0) {
// Run our command!
execvp(*array, array);
}
else {
// Wait until our child (execvp) terminates...
while(wait(&status) != pid) {
old_time = current_time;
}
}
}
// Calling stat in an infinite loop will peg the CPU.
sleep(1);
}
This isn't nearly as estoric as it might appear to be at first glance. It is actually pretty typical forking boilerplate.
Basically, you just need to check if you're in the fork or not, and either execute the command or wait until the command has completed.
Execution might look a little odd, execvp(*array, array);
, but that's because the convention says that the first argument should be the program to run. So we're running our command the same way that your shell would, on some level. The first array argument is just a short-hand to get the first element, because that's how pointers work and this is not an article on pointers.
We do also add a sleep
to our loop, because stat
is a function that crosses the kernel/runtime divide, which makes it somewhat expensive to use. There's not really an alternative so it is heavily optimised, but you'll still peg the first core of a CPU to 100% if you have no sleep. A second seems reasonable, and will make our watcher disappear from your task manager.
We're done.
Seriously. That's it, we're done.
There's no libraries to link, apart from libc, so just go ahead and compile it:
cc -o miniwatch src/main.c
Wait! What about that array we pre-allocated, shouldn't we free it?
The answer to that is actually... No. We only want to free that at program exit, and the OS is far better at releasing memory than we can be, so we can leave it to Linux to clear that up. We don't want to catch exit signals and clean it ourselves. We'll be slower in every way, and more error prone.
What about all the missing error handling? Execution failure, memory allocation failure, and so on?
If you care about that, then you should probably be using inotify
and not doing this yourself. But, sure, add in the half-dozen missing if
statements to make it more robust. The manpages have everything you need to know.
Pretty much every function we used has a failure condition, so go ahead and wrap them.
Time is a fickle beast.
We've made a couple of assumptions that don't hold in the real world. This introduces edge-cases into our code, and if they matter to you, then working around them should be easy enough, but they don't matter for my use case.
Time always moves forward. Time might, as a concept, generally move forward. Filesystems and system clocks, however, really don't. There's all sorts of ways that modified time stamps might actually move backwards, and you might want to respond to that. (Daylight savings, filesystems losing track and resetting, etc.)
Time functions can fail. You'll need to be checking and responding to
errno
appropriately.
How robust is the CLI handling?
Not at all. There's no checking of argc, but we do reach into argv and just assume that things will work.
As it stands, a badly formed command will probably result in a segmentation fault. And make any half-decent linter scream at you.
Infinite Loop
We rely on an infinite loop to get our behaviour, because that's what makes the most sense. And most sensible compilers will actually do what we want, here.
However, until the introduction of C11, the compiler wasn't required to check if an I/O operation occurred within an infinite loop before assuming the loop would terminate, and thus eliminate it. Basically, if you're compiling under C99... Here be dragons. Thankfully, C11 fixed that gap in the specification.