I Hosting photos with rokka.io. i
For the past few years I’ve used Flickr to host most of the photos for this site, with the help of Christian Nunciato’s jekyll-flickr plugin. But with Flickr on life support as Yahoo circles the drain, I’ve been looking for an alternative. I’ve even considered rolling my own. But this weekend I stumbled across rokka.io, and I’m using it for my photo hosting as of today.
Rationale
I had two main issues with Flickr:
-
It supports a very limited set of image sizes (described here)
-
Its JPEG compression settings (which you can’t control) don’t reflect current best practice, and it doesn’t serve progressive JPEGs
The latter issue is clearly shown using profiling tools like WebPageTest, which issued this report for Flickr version of this site.
rokka.io
Rokka is a slick, developer-oriented service which allows you to store images at their original quality and then define composable “stacks,” essentially sets of transformations (e.g., resize) to your images. The stack outputs are cached and reused for subsequent requests and it’s all served on Amazon’s CDN, just like the HTML for this blog. There’s built-in support for device pixel ratios and forward-looking formats like WebM and even HEIF (despite the latter not yet being supported by any browser). In my opinion it’s a great fit for static-site-generation tools like Jekyll.
Migrating
Moving the site over involved some shell gymnastics but it was a lot of fun. (Note: all of the shell snippets below use fish syntax, since that’s what I use.)
Copying the photos themselves
Since Rokka bills for storage, I didn’t want to migrate everything from Flickr, just the photos I’d actually used in this blog. My first step was to monkeypatch the Jekyll Flickr plugin to build the site (locally) with the original1 (i.e., full-sized) images. This was a one-line change:
- selected_size = @photo[:sizes][@photo[:size]]
+ selected_size = @photo[:sizes]["Original"]
Then, after running jekyll build
to generate the site locally, I
used htmlq to pull out the
URLs and photo titles (rendered as the images’ alt
text in the
blog):
$ grep -h flickr.com */*.html | htmlq --filter img @src > photo_urls
# e.g. https://farm4.staticflickr.com/3877/14973573047_028f312c85_o.jpg
$ grep -h flickr.com */*.html | htmlq --filter img @alt > photo_descriptions
# e.g. At Illini State Park
Then I grabbed all the original-size photos with wget -i
and uploaded
them to Rokka using its web interface (there is, of course, also API
access to image uploading).
Creating the stack
Originally I tried doing this through the Rokka UI, but that resulted
in the jpeg.quality
and webm.quality
stack options being set,
which the documentation warns
against,
because setting these explicitly disables Rokka’s adaptive
optimization. So I used the API instead (I’m omitting the API key
headers for simplicity; blog
is the stack name):
$ curl -X PUT 'https://api.rokka.io/stacks/eli-naeher/blog' -d '
{
"operations":
[
{
"name": "resize",
"options": {
"width": 800,
"height": 800,
"upscale": false
}
}
],
"options": {"autoformat": true}
}'
Updating the posts
The source files for my blog posts (as you can see here) had used snippets like this to embed photos from Flickr:
{% flickr_photo 40494515924 "Medium 800" %}
In this case 40494515924
is the Flickr ID for the photo and "Medium
800"
indicates the size I want. The Jekyll Flickr plugin converts
this into a URL during rendering, and pulls the Flickr photo title
into the alt
and title
attributes.
I decided I wanted the Rokka plugin I was going to have to write2 to expect something like this:
{% rokka_photo 0ccc2a "Connecticut back road" %}
0ccc2a
is the Rokka ID (called the “short hash” in Rokka’s
documentation and API). And while I could have used Rokka’s metadata
to store the Flickr image titles, I’ve always felt that the
descriptions are really properties of a specific use of an image in a
specific blog post rather than intrinsic properties of the photos
themselves, so I wanted to have them right in the post source. (Since
the size is now an attribute of the Rokka stack, no need to specify
every time we embed an image.)
To do this I needed to create a tripartite mapping between Flickr IDs, Rokka short hashes, and photo descriptions. The Flickr IDs were easily extracted from the list of URLs I already had:3
$ cat photo_urls | cut -d/ -f 5 | cut -d_ -f 1 > flickr_ids
With paste
I could associate those with the photo descriptions (since
photo_urls
and photo_descriptions
are in the same order):
$ paste flickr_ids photo_descriptions > flickr_id_to_photo_description
# result looks like this (tab-delimited)
# 14973573047 At Illini State Park
# 22878423785 IMG_1986
# 10420650534 Wooden structure along the North Shore Channel Trail
When I’d uploaded the images to Rokka, the filenames had been the same
as those in the Flickr URLs (e.g. 14973573047_028f312c85_o.jpg
). The
first part of these (before the first underscore) is the Flickr ID
(the second part is something called the “secret” that we don’t need.)
I was able to use the Rokka API and
jq to associate these IDs with Rokka
short hashes:
$ curl 'https://api.rokka.io/sourceimages/eli-naeher?limit=2000' \
| jq -r '.items[] | (.name | split("_")[0]), .short_hash' \
| paste -s -d '\t\n' - \
> flickr_id_to_rokka_id
# result looks like this (tab-delimited):
# 38966684561 de6d56
# 25095434478 a4eb2a
# 38966685101 b2c40f
Given these two mappings (Flickr ID to description and Flickr ID to
Rokka short hash), join
can generate a file with all three columns:
$ join -t\t (sort flickr_id_to_rokka_id | psub) \
(sort flickr_id_to_photo_description | psub) \
> flickr_id_rokka_id_photo_description
# result looks like this (tab-delimited)
# 17261233894 95567e Larsen Prairie, McHenry County, Illinois
# 17263337813 95bfb8 Crossing the Fox River
# 17263346523 b04db3 Downtown Elgin, Illinois (looking south)
The last step is to do the search and replace across all of the
existing pages on the site (mostly the posts in _posts
). Perhaps a
true shell guru could have managed this with the shell alone; I turned
to awk.
FNR == NR {
rokka_ids[$1] = $2
descs[$1] = $3
next
}
{
for (k in rokka_ids) {
gsub(k,rokka_ids[k] " \"" descs[k] "\"")
gsub("\"Medium 800\" ","")
gsub("flickr_photo", "rokka_photo")
}
print
}
This replaces each Flickr ID with the corresponding Rokka short hash
and the photo description, changes the flickr_photo
tag to
rokka_photo
, and removes the "Medium 800"
specifier.
I ran it like so:
$ for file in ../_posts/*;
mv $file $file.backup;
and awk -F\t -f migrate.awk flickr_id_rokka_id_photo_description \
$file.backup > $file;
and rm $file.backup;
end
This left me with the correct post source but I still had to write the plugin to render the links. Since doing so was trivial and this blog post is too long already, I’ll just link to it.
Conclusion
WebPageTest now seems much happier with the images served on my site. In particular, the total bandwidth used by my last post went from 887 KB to 344 KB and the load time from a bit over two seconds to a bit over one.
While I regret that readers can no longer click through to Flickr to easily browse my photostream, I’d already stopped using Flickr as my canonical photo library. iCloud Photo Library has replaced it for that purpose, since it has much better integration with the various devices I use. It’s sad to see such a key piece of the “old web” fade into irrelevance–but it has.
-
Actually, the images Flickr returns are not quite the same images that were originally uploaded; for some large JPEGs, Flickr’s “original” images had much smaller file sizes than the true originals (though the dimensions were the same). I suspect there is JPEG recompression taking place and if I were more fastidious I’d have figured out a way to replace these images with the true originals before uploading to Rokka. ↩
-
Rokka supplies plugins for a few platforms (WordPress, Drupal, etc.), but Jekyll isn’t among them. ↩
-
I am compelled to note here that this is not–as it might appear to a bash user to be–a useless use of cat. The fish shell does not support the
< input-file command
syntax that would obviate the need for cat here. ↩