Hosting photos with rokka.io.

For the past few years I’ve used Flickr to host most of the photos for this site, with the help of Christian Nunciato’s jekyll-flickr plugin. But with Flickr on life support as Yahoo circles the drain, I’ve been looking for an alternative. I’ve even considered rolling my own. But this weekend I stumbled across rokka.io, and I’m using it for my photo hosting as of today.

Rationale

I had two main issues with Flickr:

The latter issue is clearly shown using profiling tools like WebPageTest, which issued this report for Flickr version of this site.

rokka.io

Rokka is a slick, developer-oriented service which allows you to store images at their original quality and then define composable “stacks,” essentially sets of transformations (e.g., resize) to your images. The stack outputs are cached and reused for subsequent requests and it’s all served on Amazon’s CDN, just like the HTML for this blog. There’s built-in support for device pixel ratios and forward-looking formats like WebM and even HEIF (despite the latter not yet being supported by any browser). In my opinion it’s a great fit for static-site-generation tools like Jekyll.

Migrating

Moving the site over involved some shell gymnastics but it was a lot of fun. (Note: all of the shell snippets below use fish syntax, since that’s what I use.)

Copying the photos themselves

Since Rokka bills for storage, I didn’t want to migrate everything from Flickr, just the photos I’d actually used in this blog. My first step was to monkeypatch the Jekyll Flickr plugin to build the site (locally) with the original1 (i.e., full-sized) images. This was a one-line change:

-        selected_size = @photo[:sizes][@photo[:size]]
+        selected_size = @photo[:sizes]["Original"]

Then, after running jekyll build to generate the site locally, I used htmlq to pull out the URLs and photo titles (rendered as the images’ alt text in the blog):

$ grep -h flickr.com */*.html | htmlq --filter img @src > photo_urls
# e.g. https://farm4.staticflickr.com/3877/14973573047_028f312c85_o.jpg
$ grep -h flickr.com */*.html | htmlq --filter img @alt > photo_descriptions
# e.g. At Illini State Park

Then I grabbed all the original-size photos with wget -i and uploaded them to Rokka using its web interface (there is, of course, also API access to image uploading).

Creating the stack

Originally I tried doing this through the Rokka UI, but that resulted in the jpeg.quality and webm.quality stack options being set, which the documentation warns against, because setting these explicitly disables Rokka’s adaptive optimization. So I used the API instead (I’m omitting the API key headers for simplicity; blog is the stack name):

$ curl -X PUT 'https://api.rokka.io/stacks/eli-naeher/blog' -d '
{
  "operations":
    [
      {
        "name": "resize",
        "options": {
          "width": 800,
          "height": 800,
          "upscale": false
        }
      }
    ],
  "options": {"autoformat": true}
}'

Updating the posts

The source files for my blog posts (as you can see here) had used snippets like this to embed photos from Flickr:

{% flickr_photo 40494515924 "Medium 800" %}

In this case 40494515924 is the Flickr ID for the photo and "Medium 800" indicates the size I want. The Jekyll Flickr plugin converts this into a URL during rendering, and pulls the Flickr photo title into the alt and title attributes.

I decided I wanted the Rokka plugin I was going to have to write2 to expect something like this:

{% rokka_photo 0ccc2a "Connecticut back road" %}

0ccc2a is the Rokka ID (called the “short hash” in Rokka’s documentation and API). And while I could have used Rokka’s metadata to store the Flickr image titles, I’ve always felt that the descriptions are really properties of a specific use of an image in a specific blog post rather than intrinsic properties of the photos themselves, so I wanted to have them right in the post source. (Since the size is now an attribute of the Rokka stack, no need to specify every time we embed an image.)

To do this I needed to create a tripartite mapping between Flickr IDs, Rokka short hashes, and photo descriptions. The Flickr IDs were easily extracted from the list of URLs I already had:3

$ cat photo_urls | cut -d/ -f 5 | cut -d_ -f 1 > flickr_ids

With paste I could associate those with the photo descriptions (since photo_urls and photo_descriptions are in the same order):

$ paste flickr_ids photo_descriptions > flickr_id_to_photo_description

# result looks like this (tab-delimited)
# 14973573047     At Illini State Park
# 22878423785     IMG_1986
# 10420650534     Wooden structure along the North Shore Channel Trail

When I’d uploaded the images to Rokka, the filenames had been the same as those in the Flickr URLs (e.g. 14973573047_028f312c85_o.jpg). The first part of these (before the first underscore) is the Flickr ID (the second part is something called the “secret” that we don’t need.) I was able to use the Rokka API and jq to associate these IDs with Rokka short hashes:

$ curl 'https://api.rokka.io/sourceimages/eli-naeher?limit=2000' \
  | jq -r '.items[] | (.name | split("_")[0]), .short_hash' \
  | paste -s -d '\t\n' - \
  > flickr_id_to_rokka_id

# result looks like this (tab-delimited):
# 38966684561     de6d56
# 25095434478     a4eb2a
# 38966685101     b2c40f

Given these two mappings (Flickr ID to description and Flickr ID to Rokka short hash), join can generate a file with all three columns:

$ join -t\t (sort flickr_id_to_rokka_id | psub) \
            (sort flickr_id_to_photo_description | psub) \
            > flickr_id_rokka_id_photo_description

# result looks like this (tab-delimited)
# 17261233894     95567e  Larsen Prairie, McHenry County, Illinois
# 17263337813     95bfb8  Crossing the Fox River
# 17263346523     b04db3  Downtown Elgin, Illinois (looking south)

The last step is to do the search and replace across all of the existing pages on the site (mostly the posts in _posts). Perhaps a true shell guru could have managed this with the shell alone; I turned to awk.

FNR == NR {
    rokka_ids[$1] = $2
    descs[$1] = $3
    next
}

{
    for (k in rokka_ids) {
        gsub(k,rokka_ids[k] " \"" descs[k] "\"")
        gsub("\"Medium 800\" ","")
        gsub("flickr_photo", "rokka_photo")
    }
    print
}

This replaces each Flickr ID with the corresponding Rokka short hash and the photo description, changes the flickr_photo tag to rokka_photo, and removes the "Medium 800" specifier.

I ran it like so:

$ for file in ../_posts/*;
      mv $file $file.backup;
      and awk -F\t -f migrate.awk flickr_id_rokka_id_photo_description \
        $file.backup > $file;
      and rm $file.backup;
  end

This left me with the correct post source but I still had to write the plugin to render the links. Since doing so was trivial and this blog post is too long already, I’ll just link to it.

Conclusion

WebPageTest now seems much happier with the images served on my site. In particular, the total bandwidth used by my last post went from 887 KB to 344 KB and the load time from a bit over two seconds to a bit over one.

While I regret that readers can no longer click through to Flickr to easily browse my photostream, I’d already stopped using Flickr as my canonical photo library. iCloud Photo Library has replaced it for that purpose, since it has much better integration with the various devices I use. It’s sad to see such a key piece of the “old web” fade into irrelevance–but it has.

  1. Actually, the images Flickr returns are not quite the same images that were originally uploaded; for some large JPEGs, Flickr’s “original” images had much smaller file sizes than the true originals (though the dimensions were the same). I suspect there is JPEG recompression taking place and if I were more fastidious I’d have figured out a way to replace these images with the true originals before uploading to Rokka. 

  2. Rokka supplies plugins for a few platforms (WordPress, Drupal, etc.), but Jekyll isn’t among them. 

  3. I am compelled to note here that this is not–as it might appear to a bash user to be–a useless use of cat. The fish shell does not support the < input-file command syntax that would obviate the need for cat here. 

April 8, 2018