O’Reilly’s journey to EPUB 3

Upgrading to EPUB 3 is not a trivial undertaking

We at O’Reilly are very pleased to announce that we have officially upgraded to EPUB 3, and ebook bundles purchased from oreilly.com will now include EPUB 3 files, in addition to Mobi and PDF files. All O’Reilly ebooks released in 2013 are now available in EPUB 3 format, and in the coming weeks, we will be updating and rereleasing our backlist ebooks in EPUB 3 as well.

But while we’re excited to share this news, this article is not merely a press release. The decision of when and how to upgrade to EPUB 3 has been challenging for many in the publishing community, and it has been a long journey for O’Reilly as well. I’d like to talk more about why we chose to take this step now, what additional value we believe EPUB 3 provides to our customers, and the challenges and tradeoffs we’ve tackled in making our EPUBs backward compatible with EPUB 2 platforms.

EPUB 3: Why Now?

It’s been more than a year since the EPUB 3 specification was first approved by the IDPF (it was released in October of 2011). Which begs the question, “Why the one-year delay to adopt EPUB 3?” or less politely, “What took you so long?” The answer is that upgrading to EPUB 3 is not a trivial undertaking, nor is it one that can be reasonably taken unilaterally. To successfully produce and deliver EPUB 3 as part of a ebook program, there are two key prerequisites: having the necessary workflows and tools in place to create EPUB files compliant with the EPUB 3.0 specification, and having ereader platforms available that formally support the 3.0 format. As of early 2012, neither of these preconditions was met, but in the past year, there has been much progress on both fronts. Here are some key milestones:

  • December 2011: Azardi launches one of the first desktop readers for EPUB 3
  • February 2012: Launch of Readium project, an open source EPUB 3 reader for Google Chrome browser
  • October 2012: Apple releases iBooks 3.0, with formal support for EPUB 3 and accompanying documentation
  • December 2012: After beta updates throughout the year, Epubcheck 3.0 (validator for EPUB 3 content) is officially released

Over the past year and a half, O’Reilly has sponsored the DocBook project’s development of open source XSL stylesheets for transforming DocBook XML content to EPUB 3, which we’ve used to update our own toolchain to produce EPUB 3 output. With the release of iBooks 3.0 in late 2012, a critical mass of O’Reilly’s readers had devices that supported EPUB 3 content. We felt it was time to upgrade our content to EPUB 3 to provide people using 3.0-compliant platforms the best quality reading experience.

Additionally, we believed it was important to further throw our support behind the latest version of the EPUB standard to encourage vendors to upgrade their ereading platforms to support HTML5 and EPUB 3. Since 2011, there has been a chicken-or-egg attitude that’s pervaded much of the hand-wringing around EPUB 3, where publishers felt justified in holding back from producing EPUB 3 content until there was widespread ereader support, and ereader vendors felt no sense of urgency in adding EPUB 3 support to their products because there was no significant influx of EPUB 3 content from publishers. We’d like to think that by releasing our content in EPUB 3, we’re doing our part to help break this impasse and push the industry forward.

A Look Inside Our New and Improved EPUBs

Under the hood, EPUB 3 ebooks use HTML5 documents to represent the book content, whereas EPUB 2 documents use XHTML 1.1 documents, but this change in HTML version doesn’t inherently translate into a visually perceptible difference in how content is presented onscreen; that’s what CSS is for. The value in the upgrade to HTML5 lies both in the rich semantics it affords for marking up content and the multimedia/interactivity features it enables in ebook content.

Rich semantics in HTML5 and EPUB 3

The HTML5 specification adds a handful of new elements that are especially useful for marking up ebook content in a more semantic fashion. Notable new structural elements include <section>, <aside>, and <figure>, which can be used to tag sections, sidebars/footnotes/endnotes, and photos/illustrations, respectively. This helps decrease the need to rely heavily on <div>s with custom classes to block off content for styling. In addition to the new HTML5 structure elements, EPUB 3 introduces the epub:type attribute, which permits further inflection of HTML5 elements in line with book semantics:

<aside epub:type="sidebar">
<p>Fascinating digression on the history of the Web…</p>
</aside>

epub:type accepts a set of values defined in the EPUB 3 Structural Semantics Vocabulary (and can also accept custom values from other vocabularies through custom prefixes).

Using semantic HTML5 and epub:type inflections is more than just a best-practices exercise or a way to flex your geek cred (“I use <figcaption> and <hgroup> in my EPUBs; how about you?”). Rich tagging is a key component of making EPUB content as accessible as possible to people with visual disabilities or other reading impairments. Screen readers that encounter an <aside> tag can accurately convey to readers that the text within is tangential to the main book flow. But they will not be able to do the same if that same content is tagged as, say, <div class="rectangular_border">.

Ereading platforms are already taking advantage of EPUB 3 rich semantics to add features to their software for readers. As an example, Apple’s iBooks reader now keys special functionality to footnote markers inflected with epub:type="noteref" and corresponding footnote text with epub:type="footnote", and transforms them into pop-up footnotes for the reader. O’Reilly has implemented these semantics in its EPUB 3 files; see the screenshot below for an example from EPUB 3 Best Practices:

Pop-up footnote in iBooks Reader for iPad

Pop-up footnote in iBooks Reader for iPad

Multimedia/Interactivity in HTML5 and EPUB 3

Because EPUB 3 supports the full HTML5 element set, it’s now possible to create fully valid EPUB content that includes <audio>, <video>, and <canvas> content, which opens the door to true interactive, multimedia experiences within ebook content. We’ve already begun to experiment with these possibilities in EPUB 3. Here’s a screenshot from HTML5 for Publishers, featuring a coloring book implemented using SVG and JavaScript:

coloring book image of a cat

Interactive SVG+JavaScript coloring book from HTML5 for Publishers

Backward Compatibility with EPUB 2 ereaders

While the iBooks reader fully supports EPUB 3 content, the same cannot be said most of the major EPUB ereaders on the market, and this poses a challenge for publishers. How do you deliver an EPUB 3 experience for iBooks, but also make your content available on EPUB 2 devices like the NOOK or Sony Reader? One option is to produce two versions of your EPUB content: an EPUB 3 version for devices that can support that spec, and an EPUB 2 version for all other platforms. Another choice is to make backward-compatible EPUB 3 files, which are compliant with the 3.0 spec but also meet EPUB 2 requirements and contain the necessary legacy EPUB 2 metadata to be processed and rendered properly on EPUB 2 platforms.

For the ebook bundles we sell on oreilly.com, we have taken the latter approach and are releasing one, backward-compatible EPUB 3 file in our ebook bundles for download,* rather than offering both an EPUB 2 and an EPUB 3. We believe there’s real value in providing a “universal” file that can be used on all EPUB ereaders, both in terms of keeping things simple for customers (who shouldn’t have to concern themselves with whether their ereader is EPUB 3-compliant or not) and eliminating the hassle inherent in juggling two sets of EPUB files should folks want to load their ebooks to both their iPad and NOOK.

Developing “universal” EPUB 3 files is a more challenging undertaking for publishers, however. If you love working with EPUB metadata, you’re in luck: producing backward-compatible EPUB 3 basically means double the metadata and double the fun. Because the EPUB 3 standard modifies the spec requirements for some key metadata points, you’ll need to provide this metadata twice: once in EPUB 2 format and once in EPUB 3 format. For example, the EPUB 3 spec requires the metadata indicating the ebook cover image to be specified in the OPF file by putting properties="cover-image" on the corresponding manifest <item> like this:

<item id="cover-image" properties="cover-image" href="orm_front_cover.jpg" media-type="image/jpeg"/>

But many EPUB 2 ereaders are expecting the cover image to be specified as follows:

<meta name="cover" content="cover-image"/>

So to ensure that cover thumbnails appear properly on all virtual bookshelves across reading systems, you’ll need to include this metadata both ways.

If you were excited about upgrading to EPUB 3 but were heartbroken by the idea of abandoning the NCX Table of Contents, you’re also in luck, because you’ll need to continue including an NCX file in order to ensure the table of contents renders on EPUB 2 platforms. Since the EPUB 3 specification mandates a Navigation Document including a table of contents tagged in XHTML5 (using <nav> and <ol>/<li> markup), that means you need two separate TOCs in your backward-compatible EPUB 3s.

While embedding this extra legacy EPUB 2 data will ensure that your metadata problems are solved, it won’t address a thornier problem, which is that most EPUB 2 ereaders don’t parse HTML5-specific markup. In other words, if you put a <section> tag in your EPUB file, ereaders like Adobe Digital Editions will just ignore it. This isn’t a catastrophic problem for HTML5 semantic tags like <section> and <aside>, as they don’t carry implicit formatting expectations in the same way that an <ol> does. However, it will introduce problems if you’re implementing cross-references to these HTML5 elements. For example, if you have a <section> as follows:

<section id="important_instructions">
<h2>Very Important Instructions</h2>
<-- And so on... -->

which you then intend to cross-reference later in your ebook like this:

<a href="#important_instructions">Click here to see Very Important Instructions</a>

then you’re in trouble on many EPUB 2 readers, because when they fail to parse <section>, they also fail to parse the id attribute on the tag, which means hyperlinks to it will not work.

To work around this limitation, you may find yourself having to modify the placement of your id attributes (e.g., putting them on the <h2> instead of the <section>) or compromise your semantic tagging (e.g., using <div> instead of <section>), both of which we have done in different circumstances, depending on the results of testing on different EPUB 3 and EPUB 2 readers.

Dealing with multimedia elements like <video> is another area that presents both challenges and tradeoffs. If you’re adding a <video> tag to your markup, the desired behavior is for a corresponding video to be embedded that the reader can watch. But for ereaders that don’t support <video>, you’ll need to provide adequate fallbacks to mitigate the degradation of the experience as much as possible. This may mean providing a poster image in place of the video, or a hyperlink to the same video content on the Web. Here’s an example of a <video> tag that falls back to a hyperlink:

<video id="asteroids_video_mp4" width="480" height="270" controls="controls">
<source src="examples/html5_asteroids.mp4" type="video/mp4"/>
<p><em style="color: red">Sorry, the &lt;video&gt; element is not 
supported in your ereader, so you will not be able to watch this video here.</em> 
If you have Web access, <a href="http://examples.oreilly.com/0636920022473/video/video.html">
click here</a> to try running it in your browser.</p>
</video>

And here’s a side-by-side comparison of how this video element renders on the iBooks reader (which supports embedded <video>), and Adobe Digital Editions 1.8, which does not.

Embedded video of Asteroids game being displayed in iBooks reader on the left, but fallback URL displayed in Adobe Digital Editions on right

Embedded displayed in the iBooks reader for iPad (left), and video fallback displayed on Adobe Digital Editions (right)

Conclusion

In the past year, we’ve seen much progress in the ebook landscape toward greater adoption of EPUB 3. Yet even so, in 2013 we’re still in the incipient stages of EPUB 3 support, which makes early adoption of the standard a challenge. We hope that the upcoming year will see more widespread adoption of the EPUB 3 standard by publishers and ereader vendors.

* For now, O’Reilly is continuing to deliver EPUB 2 files to other digital retailers, as many ebook stores will not accept EPUB 3 files for sale, even if these files are backward compatible with EPUB 2 readers.

UPDATE: On a related note, be sure to read this follow-up article from my colleague Nellie McKesson on how we’re working to simplify and eliminate competing visual distractions for readers.

tags: , , , , , , , , , , , , , , ,