fauxtrots

Automating blog post timestamps

A regex journey

I've been laying down the tracks on this blog as it's rolling along, so I thought I'd write up a little improvement I made recently. This will also serve as it's first real test, so fingers crossed that it actually works and doesn't produce an embarrassing mistake.

Like pretty much any blog, my posts are dated. However, previously those dates were hand entered by me as I wrote the markdown files. That left the annoyingly likely possibility that a typo or forgetting to update the frontmatter would leave me with posts time traveling from the past or future. It was also highly imprecise - only recording the date. While it's unlikely that I'll ever post anything where precise timestamps are required, it still bugged me. Being of a technical nature, I decided to automate this, and learned something along the way!

Since my blog is hosted as a GitHub repo, I decided to try and hook this into the git commit process. I already had pre-commit set up with a hook that strips metadata from pictures1So I don't dox myself. with exiftool, so that seemed like the right place to start. Here's my final .pre-commit-config.yaml file:

repos:
	- repo: local
	hooks:
	- id: no-spicy-exif
		name: Ban spicy exif data
		description: Ensures that there is no sensitive exif data committed
		language: system
		entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original
		exclude_types: ['svg']
		types: ['image']
	- id: timestamp-posts
		name: Add createdAt and modifiedAt timestamps to posts automatically
		types: ['file', 'markdown']
		language: script
		entry: ./timestamp.sh

The way it works is that if there are any markdown files in the git diff, it runs the script timestamp.sh. What's timestamp.sh?

#!/usr/bin/perl
# this is a pre-commit hook for setting a timestamp
# in the frontmatter of markdown files
my $date = time();
my @files = `git diff-index --cached --name-only HEAD`;
foreach (@files) {
    if (/\.md$/) {
      open FILE, $_ or die $!;
      my @lines = <FILE>;
      close FILE or die $!;
      my $file = join('', @lines);
      if ($file =~ m/---\n(.*?)---\n(.*)/s) {
        my $frontmatter = $1;
        my $body = $2;
        if (!($frontmatter =~ /^createdAt: .*$/m)) {
          $frontmatter = $frontmatter . "createdAt: $date\n";
        } elsif (!($frontmatter =~ s/^modifiedAt: .*$/modifiedAt: $date/m)) {
          $frontmatter = $frontmatter . "modifiedAt: $date\n";
        }
        open FILE, ">$_" or die $!;
        print FILE "---\n$frontmatter---\n$body";
        close FILE or die $!;
        system "git add $_";
      }
    }
}

There you go, easy! Well, maybe if you know perl. I... don't. The last time I touched perl scripts would have to be over a decade ago in high school, and I didn't do much with it then. This script was heavily based on a gist by bensteinberg I found from a quick search. However, there were some changes I wanted to make, which meant I needed to dust off those skills and dive in.

The first couple of lines are pretty simple, just defining some variables. I store a unix timestamp2my $date = time(); and the array of changed files3my @files = `git diff-index --cached --name-only HEAD`​. From there we loop over each of those files and check whether they're markdown files. If not we move onto the next item in the array, if so we continue. There are a couple of lines of file manipulation that I don't fully understand the syntax of, but they work so it's fine.4Side note, I love the or die syntax for crashing if an operation fails and that should be universal. Maybe an alternative could be then perish.

After that we enter into the regex zone. This was the trickiest part of the script for me. I'm familiar enough with regex, but the perl-specific tricks (like ​=~ checking for matches) gave me some trouble to figure out. First we're checking whether the markdown file has frontmatter5A block at the beginning of the file surrounded by ---. and extracting it if so. Then we check whether it already has a createdAt timestamp, and if not we add one by concatenating it to the frontmatter string.6 if (!($frontmatter =~ /^createdAt: .*$/m)) { $frontmatter = $frontmatter . "createdAt: $date\n"; } If it does already have a createdAt property then we move on to checking whether it has the modifiedAt property. This was the most confusing step for me, so I'm gonna break it down.

} elsif (!($frontmatter =~ s/^modifiedAt: .*$/modifiedAt: $date/m)) {
  $frontmatter = $frontmatter . "modifiedAt: $date\n";
}

This is saying that if $frontmatter does not contain a match for that regex, then we add the modifiedAt property to the end. But what's that regex doing?

s/^modifiedAt: .*$/modifiedAt: $date/m

Starting from the beginning, the s before the slash means that we're performing a find-and-replace. The first match of ^modifiedAt: .*$ will be replaced by modifiedAt: $date. The search pattern starts with a caret (^) and ends with a $, which means that the contained pattern must start and end the string being tested, in this case $frontmatter. Naively this might seem to only occur if "modifiedAt: ..." were the only entry in the frontmatter, but the m at the end of the whole expression means that each line is tested individually! So putting it all together, this regex tests whether a modifiedAt property is present, and if so it gets updated to $date, the current7I guess potentially very slightly past, depending on your execution speed. unix timestamp. The fact that that replacement happens inside the conditional surprised me to begin with, and honestly it's still a bit unintuitive, but I appreciate the conciseness.

Once that's done all that's left is cleanup. We open the file, replace its contents with our new updated ones, and git add our changes so they are included in our still-in-progress commit. And that's it! I'm sure there are improvements that could be made - for one thing, I think that the script gets called for every markdown file included in the diff, which means that if I try to add two posts at once that they might both get createdAt tags when the script is run for the first time, and then both get near modifiedAt timestamps when it gets run again. I'm not so prolific a writer that I expect to publish multiple blog posts at once, but if anyone sees an easy fix please write in! I'm glad that I was able to see a problem, fix it, and learn something from the process all at once.

Update: This post got a sequel! It's about adding timestamps to the links page and the hubris of document parsing.


  1. So I don't dox myself.

  2. my $date = time();

  3. my @files = `git diff-index --cached --name-only HEAD`​

  4. Side note, I love the or die syntax for crashing if an operation fails and that should be universal. Maybe an alternative could be then perish.

  5. A block at the beginning of the file surrounded by ---.

  6. if (!($frontmatter =~ /^createdAt: .*$/m)) {
      $frontmatter = $frontmatter . "createdAt: $date\n";
    }
    
  7. I guess potentially very slightly past, depending on your execution speed.

Liked reading my thoughts? Consider leaving a comment below or dropping a tip in my KoFi! No matter what, thanks for reading.