Automating link timestamps: Part 2
How hard could it be to parse YAML?
This is a followup to my previous post about Automating Blog Timestamps and will make much more sense if you read it first.
Last time I wrote a Perl script to add timestamps to new blog posts as they're committed to the git repo for this site. But I also have a links page1Check it out, there's some cool stuff! that's set up differently, so the previous method of adding timestamps wouldn't work. Instead of each entry having its own file, links are stored as an array in a YAML file2Technically before this they were stored in an array in a JavaScript file, but don't nitpick., so we can't just add the timestamp to the top of a file when it's added to the repo. Instead, we need to actually parse the document to see which entries are new vs pre-existing. This began my descent into hell.
I didn't want to write a document parser myself, which meant it was time to learn how to find and
use Perl modules. After a bit of searching, it looked like CPAN was
the tool for the job. For some reason setting that up and establishing my environment took a long
time3Just leaves more time for swordfights., but whatever. First up I actually wanted to use JSON, not YAML. I went down a rabbit hole
trying to use JSON::PP, but for some reason I wasn't able to
read the data and go through it in a loop.4This whole process was made more annoying by the fact that Perl apparently doesn't have a built
in way to print (the contents of) arrays and hashes. I ended up using
Data::Dump in order to actually see what a hash contained
instead of HASH(0x5584948e7788). Apparently my desired data structure (an array of
hashes) isn't commonly used in perl-land. I probably could have made it work eventually, but at this
point I switched to trying out YAML.
Perl has several YAML parsing libraries5Apparently the first YAML implementation was in Perl!, but I ended up going with YAML::PP. First I created an instance of the library:
use YAML::PP;
my $ypp = YAML::PP->new;
Then added in an elsif to the filetype conditional: elsif ($file =~ /\.yaml$/). This is where the
trouble really started. Perl has three fundamental variable types which each have their own symbol
prefix: scalar ($foo), array ($bar), and hash (%baz). Figuring out how to use these symbols in
combination with each other was mostly achieved by trial and error, and I'm still not 100% sure on
how it all works. Lets go through it line by line.
my $links = $ypp->load_file($file);
This seems fine to begin with, but naively I would expect links to be an array, and thus be prefixed
with an @. However, if you do that you then run into trouble on the next line...
foreach my $link (@$links) {
We're setting up a for loop here over the items in $links. Wait, that's... @$links? My
understanding is that $links is actually an array reference, and by adding the @ prefix on
we're dereferencing it so that we can loop over the values. The current value in each iteration of
the loop is stored in $link. This should be a hash, but it seems that loop variables such as
this are also references, and such have to be defined as scalars.
if (!%{$link}{"createdAt"}) {
Now we're checking if the current link hash contains a createdAt key. We again have to dereference
the variable into a hash. If the key isn't present we assume the link is new and move on to:
$link->{'createdAt'} = $date;
Assigning the current timestamp to the hash. After that we dump the array out into the file again.
The first time I got this to work it created a YAML file composed of multiple documents; that is,
each link was separated by a ---. Unfortunately that caused the JavaScript YAML parsing library
I'm using6@modyfi/vite-plugin-yaml to not import correctly, so I had to futz around with it until I got to the current
state. Here's the final result:
use YAML::PP;
my $ypp = YAML::PP->new;
my $date = time();
my @files = `git diff-index --cached --name-only HEAD`;
foreach my $filename (@files) {
chomp $filename;
if ($filename =~ /\.md$/) {
... code from previous post ...
} elsif ($filename =~ /\.yaml$/) {
my $links = $ypp->load_file($file); # array of links in file
foreach my $link (@$links) {
if (!%{$link}{"createdAt"}) {
$link->{'createdAt'} = $date;
}
}
$ypp->dump_file($file, $links);
system "git add $file";
}
}
I also had to make a couple changes to my .pre-commit-config.yaml file. First, I added
exclude: (.pre-commit-config.yaml) at the top level so that the script wouldn't try to add a
timestamp to the config file itself. Next, I changed the types key for this hook to
types_or: [markdown, yaml].
Overall I'm happy with how this turned out, but I hope that I won't have to touch this script (or Perl in general tbh) again for a long time.
Footnotes
-
Check it out, there's some cool stuff! ↩
-
Technically before this they were stored in an array in a JavaScript file, but don't nitpick. ↩
-
Just leaves more time for swordfights. ↩
-
This whole process was made more annoying by the fact that Perl apparently doesn't have a built in way to print (the contents of) arrays and hashes. I ended up using Data::Dump in order to actually see what a hash contained instead of
HASH(0x5584948e7788). ↩ -
Apparently the first YAML implementation was in Perl! ↩
Liked reading my thoughts? Consider leaving a comment below or dropping a tip in my KoFi! No matter what, thanks for reading.