Adventures in Dynamic Subsetting

A deep dive on web font performance.

03/07/2014 · Penarth, Wales · An initial load on nytimes.com, today, will serve 480kb of font files, ign.com will throw another 240kb at you, and the new Gotham goodness on twitter.com will net you 155kb*. Compression can help, but to truly optimize web font usage, web designers need to get familiar with subsetting. Or even better: font services could do it for us.

Font file transfers, 06/13-06/14 — Across the most popular sites on the web, transfer sizes and requests for font files have more than tripled over the past year. Chart via httparchive.org.

The growing trend

Web pages continue to get heavier. A scroll through the trends on httparchive.org shows file sizes for most assets (images, JavaScript) are on the rise, but none have grown as fast as font files over the last year. Recent conversations among front-end developers (and me) have surfaced two trends that threaten to undermine the web font gains we’ve seen:

Font choices are being rolled back during development to help meet performance budgets, meaning not only are typefaces eliminated, but even bold or italic variants are removed. This limits the expressiveness of the type and also leads to nasty, default browser bolding or obliquing.
Fonts are being removed entirely for smaller viewports. This sets a regressive precedent that stunts the small screen experience – which is increasingly our primary experience. (Think of the state of HTML Email, and then tell me that curbing capabilities is a good idea.)

My well-intentioned dev friends are thinking of what’s best for the user, and I applaud them for handling a problem that designers are just not bothered about. A warning, designers: whether we think about performance or not, these typographic choices are being made for us.

Dynamic subsetting FTW

Beyond other standard file optimization techniques, such as gzipping and setting long caches, font files have the unique ability to be subset to dramatically reduce file size. Subsetting is the act of removing glyphs (or characters) from a font file, keeping only the glyphs you may need for a site’s content.

Typically, subsets are used to add or remove support for other languages, as their glyphs can differ from Latin characters, and services such as Typekit make this fairly easy. A Latin-only subset can dramatically reduce the file size, but even a compressed woff for a common font can still weigh in at 30–50kb. Add a bold and italic variant and you’re back up to around 100kb in file transfers.

Other services, like Google Fonts and MyFonts, allow you to take subsetting further, creating subsets based on a list (or range) of individual glyphs. But the trick to subsetting at the glyph level is you need to know, up front, exactly which glyphs will be used.

To truly get the best performance from our font files, we need Dynamic Subsetting: reading the unique glyphs on a page, then downloading only what is needed. Right now, dynamic subsetting services are still just getting started, so this lead me to ask:

What would a client-side process for dynamically subsetting web fonts look like?

Let the adventures begin

To answer this question, I tried a few different methods for dividing up my font file, each with varying levels of success. For me, a successful dynamic subsetting process would balance the smallest font file size, the shortest latency to displaying content, and the least amount of requests to the server.

I tested using a full article of content from this site. In the example pages, I called Clear Sans Regular (which does allow for modification in its license) as the font-family for the body, and set the browser’s default monospace font as a fallback to make it more apparent which glyphs have or have not been loaded.

We’ll focus on three of the methods attempted:

Loading groups of characters as subsetted font files (upper, lower, numerals & punctuation)
Loading individual characters as subsetted font files (uppercase characters only)
Loading only the glyphs used on the page, utilizing Google Fonts’ “text” parameter.

All tests performed in Firefox 29.0.1 on an iMac, running Mavericks.

body { 
  font-family: "ClearSans", monospace; 
}

CSS font-family declaration for Clear Sans, with default monospace fallback.

Groups – View the demo

I generated four subsetted versions of Clear Sans using FontPrep: an uppercase alphabet, a lowercase alphabet, numerals, and a set of punctuation marks. I wrote a script to capture the text content of the page, then used some simple regular expressions to determine which groups are needed. A @font-face call is then written into the head of the page.

Group subsets waterfall — Smaller transfer, more requests.

Characters – View the demo

I took the subsetting a bit further and created a set of font files for each of the capitals in the font. I stopped short of subsetting every character because I was doing this by hand in FontPrep and I happen to not be a machine.

I then modified my script to establish a string of unique capitals from the content, and add a @font-face call in the head for that particular character’s file.

Character subsets waterfall — Larger transfer, too many requests.

Google Fonts – View the demo

I used Google Fonts as a template for how a subsetter-as-a-service (you know, SaaS) should work. You can bring back a specified set of glyphs very easily. One file, served on two requests, containing only the glyphs you need. Sounds like a perfect solution for dynamic subsets.

I modified my script to place all unique characters in to a string and tack it on the end of the Google Fonts href. Since Google does not provide Clear Sans, I used Open Sans for this test. To compare, I also rendered the page with the full Open Sans font.

Google Fonts subset waterfall — Smallest transfer, only two added requests.

var head = document.getElementsByTagName('head')[0],
    body = document.getElementsByTagName('body')[0],
    content = body.textContent,
  
    subset = '';

for(var i = 0; i < content.length; i++){

  if( subset.indexOf(content[i])==-1 ) subset += content[i];
  
}

head.innerHTML += "<link href=\"http://fonts.googleapis.com/CSS?family=Open+Sans&text=" + encodeURIComponent(subset) + "\" rel=\"stylesheet\" type=\"text/CSS\">";

JavaScript for reading unique glyphs on the page and adding them to the Google Fonts call. By far the most useful thing on this page.

CSS `unicode-range` – View the demo

Updated 10/07/2014 · After reading an article from Jake Archibald on web font optimization, I was curious about the possibilities here using unicode-range. I ran one more test, utilizing the Groups I had created before (uppercase, lowercase, numerals, punctuation), but instead of relying on JavaScript to determine which file to load, I let the browser decide based on the unicode-range set for each @font-face call.

Note: any performance boost seen here only works in Chrome, as it’s the lone browser so far that’ll only load font files based on the detected unicode ranges.

Unicode-range subsets waterfall — Same as the Groups test, but no JavaScript, and only works in Chrome (shown here).

Further notes

Font rendering comparisons on Art=Work — A – Regular font rendering.
B – Subset font rendering without extra font data.

As of right now, FontPrep will strip out important font data, like kerning and hinting tables. This can lead to poor rendering, so watch for that when using apps to modify font files. Google Fonts seems to redraw these tables when rendering the subset file, as does MyFonts and fonts.com, though the latter two require using their site’s interface and downloading the font (rather than linking directly with a subset variable).
I’m not sure base64 encoding is worth it, given that base64 is not as compressed as woff. In my tests, base64 added 36% more file size over a woff file. Running it through an encoder also adds an extra step in preparing the subsetted font.
On learning that you can use the HTML5 canvas element to draw fonts using JavaScript, I thought this might afford some new opportunities for subsetting over plain CSS. It doesn’t. The entire font is still loaded, though it is interesting that the drawn font is not selectable.
For improving latency, add any font loading scripts at the bottom of the page. In my experience, this started fonts loading the fastest versus waiting for onDOMContentLoaded or async. The latter two waited for the event, ran the logic, then loaded the font.

Bringing it home

Creating my own subset groupings was the best method I tested that did not require a third-party, server-side service. I believe this is the best we can do, on the client side, for now. Thankfully web font providers such as MyFonts, and FontFont are starting to give more subsetting control, and even smaller foundries such as Grilli Type are amending their licenses allow custom subsets, but these solutions only provide for static, self-hosted font files.

Playing with Google Fonts’ “text” parameter, though, shows what the future could look like for dynamic subsetting. If more web font services started providing glyph-level control when calling fonts from their servers, the rest is a snap.

So there you go, web workers. Let’s make it happen.