• Print

O’Reilly’s journey to EPUB 3

Upgrading to EPUB 3 is not a trivial undertaking

We at O’Reilly are very pleased to announce that we have officially upgraded to EPUB 3, and ebook bundles purchased from oreilly.com will now include EPUB 3 files, in addition to Mobi and PDF files. All O’Reilly ebooks released in 2013 are now available in EPUB 3 format, and in the coming weeks, we will be updating and rereleasing our backlist ebooks in EPUB 3 as well.

But while we’re excited to share this news, this article is not merely a press release. The decision of when and how to upgrade to EPUB 3 has been challenging for many in the publishing community, and it has been a long journey for O’Reilly as well. I’d like to talk more about why we chose to take this step now, what additional value we believe EPUB 3 provides to our customers, and the challenges and tradeoffs we’ve tackled in making our EPUBs backward compatible with EPUB 2 platforms.

EPUB 3: Why Now?

It’s been more than a year since the EPUB 3 specification was first approved by the IDPF (it was released in October of 2011). Which begs the question, “Why the one-year delay to adopt EPUB 3?” or less politely, “What took you so long?” The answer is that upgrading to EPUB 3 is not a trivial undertaking, nor is it one that can be reasonably taken unilaterally. To successfully produce and deliver EPUB 3 as part of a ebook program, there are two key prerequisites: having the necessary workflows and tools in place to create EPUB files compliant with the EPUB 3.0 specification, and having ereader platforms available that formally support the 3.0 format. As of early 2012, neither of these preconditions was met, but in the past year, there has been much progress on both fronts. Here are some key milestones:

  • December 2011: Azardi launches one of the first desktop readers for EPUB 3
  • February 2012: Launch of Readium project, an open source EPUB 3 reader for Google Chrome browser
  • October 2012: Apple releases iBooks 3.0, with formal support for EPUB 3 and accompanying documentation
  • December 2012: After beta updates throughout the year, Epubcheck 3.0 (validator for EPUB 3 content) is officially released

Over the past year and a half, O’Reilly has sponsored the DocBook project’s development of open source XSL stylesheets for transforming DocBook XML content to EPUB 3, which we’ve used to update our own toolchain to produce EPUB 3 output. With the release of iBooks 3.0 in late 2012, a critical mass of O’Reilly’s readers had devices that supported EPUB 3 content. We felt it was time to upgrade our content to EPUB 3 to provide people using 3.0-compliant platforms the best quality reading experience.

Additionally, we believed it was important to further throw our support behind the latest version of the EPUB standard to encourage vendors to upgrade their ereading platforms to support HTML5 and EPUB 3. Since 2011, there has been a chicken-or-egg attitude that’s pervaded much of the hand-wringing around EPUB 3, where publishers felt justified in holding back from producing EPUB 3 content until there was widespread ereader support, and ereader vendors felt no sense of urgency in adding EPUB 3 support to their products because there was no significant influx of EPUB 3 content from publishers. We’d like to think that by releasing our content in EPUB 3, we’re doing our part to help break this impasse and push the industry forward.

A Look Inside Our New and Improved EPUBs

Under the hood, EPUB 3 ebooks use HTML5 documents to represent the book content, whereas EPUB 2 documents use XHTML 1.1 documents, but this change in HTML version doesn’t inherently translate into a visually perceptible difference in how content is presented onscreen; that’s what CSS is for. The value in the upgrade to HTML5 lies both in the rich semantics it affords for marking up content and the multimedia/interactivity features it enables in ebook content.

Rich semantics in HTML5 and EPUB 3

The HTML5 specification adds a handful of new elements that are especially useful for marking up ebook content in a more semantic fashion. Notable new structural elements include <section>, <aside>, and <figure>, which can be used to tag sections, sidebars/footnotes/endnotes, and photos/illustrations, respectively. This helps decrease the need to rely heavily on <div>s with custom classes to block off content for styling. In addition to the new HTML5 structure elements, EPUB 3 introduces the epub:type attribute, which permits further inflection of HTML5 elements in line with book semantics:

<aside epub:type="sidebar">
<p>Fascinating digression on the history of the Web…</p>
</aside>

epub:type accepts a set of values defined in the EPUB 3 Structural Semantics Vocabulary (and can also accept custom values from other vocabularies through custom prefixes).

Using semantic HTML5 and epub:type inflections is more than just a best-practices exercise or a way to flex your geek cred (“I use <figcaption> and <hgroup> in my EPUBs; how about you?”). Rich tagging is a key component of making EPUB content as accessible as possible to people with visual disabilities or other reading impairments. Screen readers that encounter an <aside> tag can accurately convey to readers that the text within is tangential to the main book flow. But they will not be able to do the same if that same content is tagged as, say, <div class="rectangular_border">.

Ereading platforms are already taking advantage of EPUB 3 rich semantics to add features to their software for readers. As an example, Apple’s iBooks reader now keys special functionality to footnote markers inflected with epub:type="noteref" and corresponding footnote text with epub:type="footnote", and transforms them into pop-up footnotes for the reader. O’Reilly has implemented these semantics in its EPUB 3 files; see the screenshot below for an example from EPUB 3 Best Practices:

Pop-up footnote in iBooks Reader for iPad

Pop-up footnote in iBooks Reader for iPad

Multimedia/Interactivity in HTML5 and EPUB 3

Because EPUB 3 supports the full HTML5 element set, it’s now possible to create fully valid EPUB content that includes <audio>, <video>, and <canvas> content, which opens the door to true interactive, multimedia experiences within ebook content. We’ve already begun to experiment with these possibilities in EPUB 3. Here’s a screenshot from HTML5 for Publishers, featuring a coloring book implemented using SVG and JavaScript:

coloring book image of a cat

Interactive SVG+JavaScript coloring book from HTML5 for Publishers

Backward Compatibility with EPUB 2 ereaders

While the iBooks reader fully supports EPUB 3 content, the same cannot be said most of the major EPUB ereaders on the market, and this poses a challenge for publishers. How do you deliver an EPUB 3 experience for iBooks, but also make your content available on EPUB 2 devices like the NOOK or Sony Reader? One option is to produce two versions of your EPUB content: an EPUB 3 version for devices that can support that spec, and an EPUB 2 version for all other platforms. Another choice is to make backward-compatible EPUB 3 files, which are compliant with the 3.0 spec but also meet EPUB 2 requirements and contain the necessary legacy EPUB 2 metadata to be processed and rendered properly on EPUB 2 platforms.

For the ebook bundles we sell on oreilly.com, we have taken the latter approach and are releasing one, backward-compatible EPUB 3 file in our ebook bundles for download,* rather than offering both an EPUB 2 and an EPUB 3. We believe there’s real value in providing a “universal” file that can be used on all EPUB ereaders, both in terms of keeping things simple for customers (who shouldn’t have to concern themselves with whether their ereader is EPUB 3-compliant or not) and eliminating the hassle inherent in juggling two sets of EPUB files should folks want to load their ebooks to both their iPad and NOOK.

Developing “universal” EPUB 3 files is a more challenging undertaking for publishers, however. If you love working with EPUB metadata, you’re in luck: producing backward-compatible EPUB 3 basically means double the metadata and double the fun. Because the EPUB 3 standard modifies the spec requirements for some key metadata points, you’ll need to provide this metadata twice: once in EPUB 2 format and once in EPUB 3 format. For example, the EPUB 3 spec requires the metadata indicating the ebook cover image to be specified in the OPF file by putting properties="cover-image" on the corresponding manifest <item> like this:

<item id="cover-image" properties="cover-image" href="orm_front_cover.jpg" media-type="image/jpeg"/>

But many EPUB 2 ereaders are expecting the cover image to be specified as follows:

<meta name="cover" content="cover-image"/>

So to ensure that cover thumbnails appear properly on all virtual bookshelves across reading systems, you’ll need to include this metadata both ways.

If you were excited about upgrading to EPUB 3 but were heartbroken by the idea of abandoning the NCX Table of Contents, you’re also in luck, because you’ll need to continue including an NCX file in order to ensure the table of contents renders on EPUB 2 platforms. Since the EPUB 3 specification mandates a Navigation Document including a table of contents tagged in XHTML5 (using <nav> and <ol>/<li> markup), that means you need two separate TOCs in your backward-compatible EPUB 3s.

While embedding this extra legacy EPUB 2 data will ensure that your metadata problems are solved, it won’t address a thornier problem, which is that most EPUB 2 ereaders don’t parse HTML5-specific markup. In other words, if you put a <section> tag in your EPUB file, ereaders like Adobe Digital Editions will just ignore it. This isn’t a catastrophic problem for HTML5 semantic tags like <section> and <aside>, as they don’t carry implicit formatting expectations in the same way that an <ol> does. However, it will introduce problems if you’re implementing cross-references to these HTML5 elements. For example, if you have a <section> as follows:

<section id="important_instructions">
<h2>Very Important Instructions</h2>
<-- And so on... -->

which you then intend to cross-reference later in your ebook like this:

<a href="#important_instructions">Click here to see Very Important Instructions</a>

then you’re in trouble on many EPUB 2 readers, because when they fail to parse <section>, they also fail to parse the id attribute on the tag, which means hyperlinks to it will not work.

To work around this limitation, you may find yourself having to modify the placement of your id attributes (e.g., putting them on the <h2> instead of the <section>) or compromise your semantic tagging (e.g., using <div> instead of <section>), both of which we have done in different circumstances, depending on the results of testing on different EPUB 3 and EPUB 2 readers.

Dealing with multimedia elements like <video> is another area that presents both challenges and tradeoffs. If you’re adding a <video> tag to your markup, the desired behavior is for a corresponding video to be embedded that the reader can watch. But for ereaders that don’t support <video>, you’ll need to provide adequate fallbacks to mitigate the degradation of the experience as much as possible. This may mean providing a poster image in place of the video, or a hyperlink to the same video content on the Web. Here’s an example of a <video> tag that falls back to a hyperlink:

<video id="asteroids_video_mp4" width="480" height="270" controls="controls">
<source src="examples/html5_asteroids.mp4" type="video/mp4"/>
<p><em style="color: red">Sorry, the &lt;video&gt; element is not 
supported in your ereader, so you will not be able to watch this video here.</em> 
If you have Web access, <a href="http://examples.oreilly.com/0636920022473/video/video.html">
click here</a> to try running it in your browser.</p>
</video>

And here’s a side-by-side comparison of how this video element renders on the iBooks reader (which supports embedded <video>), and Adobe Digital Editions 1.8, which does not.

Embedded video of Asteroids game being displayed in iBooks reader on the left, but fallback URL displayed in Adobe Digital Editions on right

Embedded displayed in the iBooks reader for iPad (left), and video fallback displayed on Adobe Digital Editions (right)

Conclusion

In the past year, we’ve seen much progress in the ebook landscape toward greater adoption of EPUB 3. Yet even so, in 2013 we’re still in the incipient stages of EPUB 3 support, which makes early adoption of the standard a challenge. We hope that the upcoming year will see more widespread adoption of the EPUB 3 standard by publishers and ereader vendors.

* For now, O’Reilly is continuing to deliver EPUB 2 files to other digital retailers, as many ebook stores will not accept EPUB 3 files for sale, even if these files are backward compatible with EPUB 2 readers.

UPDATE: On a related note, be sure to read this follow-up article from my colleague Nellie McKesson on how we’re working to simplify and eliminate competing visual distractions for readers.

tags: , , , , , , , , , , , , , , ,
  • http://twitter.com/stephanierieger Stephanie Rieger

    My biggest frustration with HTML5 structural elements at the moment is that Amazon says they support all of them but seem to have missed a couple. I’m sure it’s an oversight on their part (ran into the same problem a few years back with web browsers…the engineers were sure all the elements were ok, but someone had forgotten to actually test them :-). 

    I’m sure they’ll show up eventually but that doesn’t make things any easier for us today!

    • Sanders Kleinfeld

      @twitter-6868612:disqus ,Yes, it’s true that Amazon does not support all HTML5 structural elements, but they do support most of them in their KF8 format, including “section”, “figure”, “figcaption”, “aside”, and “article” (see http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000729511 for a full list). So while Kindle support for rich HTML5 semantics may not be perfect, it’s still way better than pretty much every other major ereader on the market, with the exception of Apple.

      • http://twitter.com/stephanierieger Stephanie Rieger

        Thanks :-) Their list isn’t accurate however…at least not on our 7″ Fire HD. 
        If I recall correctly (don’t have my notes handy) the and elements were said to be supported but in practice were not. i.e. they were unrecognised by the user agent and could therefore not be targeted/styled using CSS.

        • Sanders Kleinfeld

          Interesting. KF8 doesn’t advertise support for “nav”, but they do claim “header” is supported. We don’t currently use “header” in our ebooks, so I haven’t tested and can’t confirm either way.

          • http://twitter.com/stephanierieger Stephanie Rieger

            Nav is at the very bottom of their spec list. 

            While looking it up, I also just realised that they claim to support the HTML5 which is an awesome little element when used with its intended cohort, the elements. In iBooks (and WebKit in general), combining these two automatically generates an expandable/collapsible component, without the need for additional JavaScript. It’s quite handy and you can even style bits of it.I just noticed however that KF8 reports that they support but don’t support …which kind of makes the whole thing a lot less useful :-(PS – Is there any way to officially file bugs with Amazon. Would gladly type up a formal bug report for these types of things there were some place to do it.

  • http://peter.krautzberger.info/ Peter Krautzberger

    How do you deal with mathematical content?

    • Sanders Kleinfeld

      @thelazyscience:disqus , In our source files, we use MathML whenever possible for complex equations, but most of the time, we convert to images for EPUB and Mobi outputs–largely because we don’t typically have a viable alternative. With the exception of iBooks, no major ereader has MathML support, or has the JavaScript support necessary to run MathJax. I hope this changes in the not-too-distant future, because we would like nothing more than to deliver EPUB 3s with embedded MathML.

      • http://peter.krautzberger.info/ Peter Krautzberger

        Great to hear you’re trying to use MathML! [disclaimer: I work for MathJax]

        Do you do something special for the broken situation on iOS5 devices (1st-gen iPads etc)? iOS5′s mobile safari has a bug which basically breaks its MathML capabilities.

        • Sanders Kleinfeld

          @thelazyscience:disqus Our use of MathML in ebooks is very limited right now, so we really haven’t encountered any significant issues in iOS5. But again, we really do hope the climate improves for MathML, so we can eliminate the need to convert to images for EPUB/Mobi

      • Rotem

        The Helicon Books Reader is FULLY support EPUB3. It supports MathML and JavaScript. 

        • OriIdan

          Adding to what Rotem wrote, the Helicon books reader uses MathJax inside it.The book author does not need to include MathJax in the source file, just write the standard MathML

  • Alberto Pettarin

    iBooks seems not to fully support MathML and for sure it does not render Media Overlays if the EPUB 3 eBook is declared “reflowable”. Hence, I would not say it is fully compliant with the EPUB 3 specification.

    In my opinion, the current reading sw/app/devices take no or little advantage of all the niceties introduced by EPUB 3.

  • Chris Rogers

    Could you point me towards some documentation for the DocBook – EPUB3 xsl stylesheets?
    Many thanks.

    • Sanders Kleinfeld

      Hi @a33a3c9ee1751221adbfb5cb047daf91:disqus ,

      The best documentation for the DocBook XSL stylesheets is Bob Stayton’s DocBook XSL: The Complete Guide, which is available online for free here (as well as in print):

      http://www.sagehill.net/docbookxsl/

      This book doesn’t cover the EPUB 3 stylesheets specifically, but it does cover the HTML XSL stylesheets that serve as the foundation of the EPUB 3 stylesheets (which handle OPF/NCX generation, etc.)

      There’s not yet much in the way of documentation for the EPUB 3 XSL stylesheets of which I’m aware, but hopefully that will change in the future. The .xsl files themselves are fairly well commented, so if you’re familiar with XSL, you’ll likely be able to get a good idea of how they operate by reading through them:

      http://docbook.sourceforge.net/release/xsl/current/epub3/ 

      Hope this helps!

  • http://twitter.com/CatHaee Haee

    This is great news! I have epub3 books but so far, they are only accepted on iBookstore. These epub3 books are animated with audio functions, but are formatted for iBookstore. Does O’Reilly take these books or do we need to reformat according to your specs? Here are the books:
    https://itunes.apple.com/us/book/haee-cat-crooked-tail/id546014524?mt=11
    https://itunes.apple.com/us/book/unconventional-life-haee/id577954667?mt=11

  • Rotem Segal

    It was just a matter of time till well familiar content companies start creating EPUB3 content.  The Helicon Books Gyan  Reader is already in the market for half a year and fully support EPUB3. 
    It support MathML, JavaScript, videos and interactivity, and even more than EPUB3- 3D. 
    http://www.heliconbooks.com/article/reader

  • flowney

    Do you think that WYSIWYG eBook editors will support ePub 3? If so when?
    To put the question as an example, if you were at Apple and working on this, what would you do?
    a) Build on Pages ePub export to yield ePub 3 documents
    b) Build on iBooks Author adding an ePub 3 export option
    c) Both

  • fred savage

    Is there any way to get rid of the sequential number in the pop up window (or at least make it smaller)? It only shows up in ibooks to my knowledge

    ex:
    ———————————————
    3

    Footnote text

    ——————————————–

    where I’d rather just have

    ——————————————–
    Footnote Text
    ——————————————–