• Print

EPUB Creation Just Got Simpler

BookGlutton announced last week that it had developed a Web-based (X)HTML to EPUB conversion form (and API). The form itself accepts HTML or XHTML documents and returns an .epub file (in a couple of seconds) for download. While it doesn’t yet support images or CSS stylesheets, it sounds like these features are coming. My handful of tests of the tool have all “just worked.” I grabbed HTML files I found on the Web and an HTML version of a recent O’Reilly title and all were happily accepted. The resulting .epub file opened fine in Adobe Digital Editions and was readable.

The impact of this sort of easy-to-use form is huge, as so many content creation tools already support (X)HTML output in some way, from Word to OpenOffice.org to DocBook to Dreamweaver. It should be the first step in lowering the barrier to entry to creating EPUB documents. Bob DuCharme had already showed technical experts how to create .epub files with nothing but free tools and I’m hopeful that the Save as DAISY output from Word will help create more accessible documents, but there’s nothing like a simple Web form to bring a complicated standard to the masses.

That said, the lack of CSS and image support really makes this more of a proof-of-concept than a real tool today, unless you’re only interested in reading narrative text. With that in mind, let’s give it a shot (in Firefox, on my Mac):

  1. Find Wikipedia’s article on E-book.
  2. Save As: Web Page, HTML only (so you don’t bother with the images or CSS):

    Saving a web page as HTML only
  3. Now take that HTML file from your computer and feed it right back to the
    BookGlutton form:

    Filling out the EPUB from HTML form
  4. Hit convert, then open the resulting .epub in Adobe Digital Editions:

    Opening the resulting .epub file in Digital Editions

Here’s the resulting .epub, for the lazy: wikipedia_on_E-book.epub. I also tried two other samples: the 3rd chapter from Word Hacks (word_hacks_chapter_3.epub) and the Ebook Format Primer from the TOC blog (ebook-format-primer.epub).

So, given our three samples, what are the current drawbacks? Well, as I mentioned before, the lack of images and CSS supprt as the two obvious ones, especially for the book content (which had images, unlike the blog post). There’s also the all-too-common drawback of HTML from the wild-wild Web being rather funky. You can see an example of that sort of oddness on the first page of the Wikipedia sample in Digital Editions (which is including some JavaScript code meant to be executed by the browser) :

// document.writeln("x3cpx3ex3ca href="http://wikimania2008.wikimedia.org/wiki/Registration"
blah blah blah

…but that stuff is ignorable and could be removed from the HTML if one cared. Another concern is that while the internal linking (from the Contents, for example) works, some of the external links back to other parts of Wikipedia don’t. Linking is a major advantage of ebooks, so this is a sad one, though this is a common web problem and not really BookGlutton’s fault. My final complaint has to do with special characters (n spaces), which seem to have gotten messed up in the book content (look around the “Figure” references). That said, the blog post looks pretty nice, once you find it a little later in the document.

Although at this stage it’s just a prototype, BookGlutton’s work might encourage the re-use of existing content published on the Web packaged as an ebook. This type of thing should significantly increase the number of .epub files ready to go into (format-friendly) ebook devices and create more pressure on ebook device manufacturers to support EPUB.

It’s time for the “regular” folks to step out of the woodwork and give this EPUB thing a try!

tags: , , ,
  • http://www.threepress.org/ Liza Daly

    In addition to the above-mentioned problems, its output also does not pass the validation provided by the epubcheck tool. Supplying valid epub documents will encourage the development of e-readers that produce uniform, expected results. End users will only accept the format if they have faith that the books will be legible, well-structured and easy to navigate.

    That said, this is a great step forward towards getting developers and publishers to experiment with the format.

  • http://kfahlgren.com/blog Keith Fahlgren

    Liza is absolutely right and I should have mentioned the invalidity of the resulting .epub documents. I did report a couple of the epubcheck errors to Aaron in the comments to his post, but the onus for .epub validity should always fall on the tools.

  • http://www.threepress.org/ Liza Daly

    There are a lot of nice things about epubcheck, but the most important is that it’s free and can be bundled with other tools. I use it with my TEI to epub converter and it was invaluable in finding small but critical problems that Adobe Digital Editions didn’t flag.

  • http://www.bookglutton.com Aaron

    Update on this: we released a new version today with support for Images. It now accepts Zip files containing collections of HTML, GIF, JPEG, PNG and SWF files. It also supports more control over metadata, and fixes several validation errors (although introducing images has created new errors about unsupported types).

    Thanks for taking me to task on the validation. It is important and we will continue to improve the validity of resulting files. Hopefully right now the convenience of this tool will outweigh the minor complaints of epubcheck. It would be nice to see similar projects try to raise the bar.

  • kiermel

    This is an online ePub validator:
    http://threepress.org/tools/

  • bowerbird

    congratulations! your echo chamber is working! :+)

    -bowerbird

  • http://www.chinaelections.org/printnews.asp?newsid=172446 James Mok

    it’s a valuable article in Chinese though