# ENTRIES TAGGED "html"

## High-quality PDF-to-EPUB conversion

### Newgen's Silk Evolve is a powerful automation platform

How many times have you opened an ebook and noticed awkward hyphenations or other conversion errors? I still see this in the majority of the ebooks I buy and it’s clear these are the result of someone not paying attention during the conversion process. They may be minor annoyances but they reflect poorly on the publishers who produce them.

I recently had a chance to talk about this problem with Patrick Martinent, the CTO at Newgen KnowledgeWorks. They have a terrific platform called Silk Evolve that helps automate and reduce the errors when going from PDF to EPUB. The following Q&A is a preview to what you can expect to hear in Patrick’s session at next month’s TOC NY conference.

## WYSIWYG vs WYSI

### WYSI editors enable a whole new level of interaction

Since HTML is the new paper and the new path to paper online editing environments are becoming much more important for publishing. Dominant until now has been the WYSIWYG editor we all know and…err…love? However the current WYSIWYG paradigm has been inadequate for a long time and we need to update and replace it. Producing text with a WYSIWYG editor feels like trying to write a letter while it’s still in the envelope. Let’s face it…these kinds of online text editors are not an extension of yourself, they are a cumbersome hindrance to getting a job done.

## Math typesetting

### Why are we leaving such an important issue to under-resourced volunteers and small organisations?

Typesetting math in HTML was for a long time one of those ‘I can’t believe that hasn’t been solved by now!’ issues. It seemed a bit wrong – wasn’t the Internet more or less invented by math geeks? Did they give up using the web back in 1996 because it didn’t support math? (That would explain the aesthetic of many ‘home pages’ for math professors.)

## InDesign vs. CSS

### Any typesetting engine without Javascript support is going to lose

The explosion in web typesetting has been largely unnoticed by everyone except the typography geeks. One of the first posts that raised my awareness of this phenomenon was From Print to Web: Creating Print-Quality Typography in the Browser by Joshua Gross.

## Ebook problem areas that need standardisation

### A user experience plea for more consistency across platforms

The “best price” phase of TOC NY 2013 registration is about to end. Don’t wait or you’ll end up paying more than you would today. To save even more on your registration, sign up here and use the discount code JOE20 to get an additional 20% off the current price on the conference package of your choice.

Ebook publishing is full of problem areas, most of which cannot be addressed through standardisation but can only come about via a sea-change in the behaviour and nature of the various participants in the ebook industry.

There are, however, several issues that could be addressed, at least partially, via standardisation, that would make everybody’s life easier if implemented.

## Overrides

One of the major issues facing publishers today is the spiralling complexity of dealing with vendor rendering overrides.

Each vendor applies different CSS overrides with differing behaviours, sometimes even only enabling features through server-side manipulation, which means that proper testing of an ebook is not only difficult, but impossible.

If vendors cannot be talked out of requiring these overrides then they need to be standardised and normalised. Any reading system that implements a CSS override is in violation of how the CSS standard defines the cascade and so is in violation of the EPUB 3 standard.

CSS overrides come in four broad types:

• Vendor styles only – The publisher’s styles are completely ignored in favour of the vendor’s.
• Aggressive vendor styles, but publisher styles enabled – Very little is seen of the publisher styles in this scenario. They mainly surface in edge cases that weren’t accounted for in the vendor’s stylesheet.
• Minimal overrides – The vendor only really enforces control over margins, backgrounds, and possibly font styles.
• Publisher styles – The mode that the reading app goes into when the reader deliberately selects ‘publisher styles’. Under ordinary circumstances this would simply disable the overrides but in most reading apps this mode has a unique behaviour.

## The new New Typography

### Replacing the book production ecosystem with webpage production tools

In the 1920s and 1930s in Europe there was a movement known as the New Typography. It was a movement that rejected traditional type set in symmetrical columns and instead treated the printers block as a blank canvas to be explored in its entirety. The calling card of the movement was type arranged in harmonious and beautiful asymmetrical compositions. In the last 2 years there is another slow breaking wave of typographical exploration. The printers block is now HTML and CSS and Javascript are fast becoming the new tools of the typographer – not just for the web, for ebooks and for print, and not just for printed books, but for all printed material.

## Browser as typesetting machine

The change of the books basic carrier medium from paper to HTML (the stuff webpages are made of) has meant many changes to what we might still call typesetting. Kindle and other e-ink devices actually move ink on a display to form words, sentences and paragraphs. The moveable type of Gutenberg’s time has become realtime, in a very real sense each book is typeset as we read it. Content is dynamically re-flowed for each device depending on display dimensions and individualised settings to aid readability. Moving type in ‘read time’ marks a significant paradigm shift from moveable type systems, including digital moveable type manipulated by Desktop Publishing software, to flowable typesetting. We are leaving behind moveable type for flowable type.

The engine for reflowing a page in realtime is something we have seen before. It is the job of the browser. And, since ebooks are webpages, browsers have come to play a central role in digital ereaders. In the case of the iPad the iBook reader is actually a fully featured browser engine; Webkit, the very same technology behind the Chrome and Safari browsers. Browsers are the typesetting machines for ebooks.

What is interesting here is that the browser can also reflow content into fixed page formats like PDF which means that the browser is becoming the typesetting engine for print. CSS and Javascript are the print design tools of our immediate future and the vast majority of innovations in this area are based on Open Source and Open Standards.

## The power of CSS and Javascript

CSS is the set of rules used by the browser to know where to place type, images and other elements on a webpage and style those elements. Typical rules define where an image is placed in relationship to text, what fonts used, the font size, background color of the page, and the maximum width of an image, etc. While designed originally for the exclusive application to webpages the CSS Working Group, responsible for overseeing the development and direction of CSS, anticipated the intersection of the book and the web some time ago. In the latest drafts of the CSS standards new additions are almost entirely focused on typography and page control. As a consequence this area is starting to blossom. In particular, the CSS Generated Content for Paged Media Module specification is astonishing for its reframing of flowable text into a fixed page. Cross reference and footnote controls, not needed on the web, are among many book-like structure controls being addressed by CSS. Table of contents creation, figure annotations, page references, page numbers, margin controls, page size, and more are all included. The definition of these rules precede their adoption in browsers, however they are being included in browser engines, notably Webkit, at a very fast pace.

Coincidently there has recently been an explosion in interest in improving browser typography primarily for the better design of websites. Although these advances have not been made with book production in mind these advances can be inherited by the browser for typesetting both electronic and paper books. Of interest is the sharp rise in the websites offering tips on CSS typography an explosion of web fonts, and some very interesting Javascript libraries.

Javascript is the programming language of the web and it can be used to create dynamic content or manipulate objects on a webpage in ways CSS can not, or can not yet. Of particular interest is Kerningjs, inspired by the previously available letteringjs library. These code libraries allow you to change each letter individually in a paragraph or heading and control the spacing between letters (called ‘kerning’). Kerning is essential for printed books, and ebooks, but missing from browsers for a very long time. Colorfont is another Javascript library which enables dual toned glyphs, and the amazing TypeSet Javascript emulates the sophisticated TeX line spacing algorythims developed by Donald Knuth. Even the layout of musical notation (which was never effectively mechanised with Gutenberg’s moveable type and was hand written into books for many decades after the printing press came into the world) has come into focus with the VexFlow Javascripts. With libraries like this it is apparent that Javascript, the programming language of the browser, has a future with typography, and with that Javascript is fast becoming the lingua franca for all typesetting.

There is a lot of fuel in these developments and, interestingly, most of it is coming from outside the traditional print and publishing industry. It could be said that these industries, built upon the printing press, have lost sight of their very foundation. Instead the IT industry is taking hold on a very deep level. Apple and Google are behind the development of Webkit – the rendering engine behind iBooks, Safari and Chrome – which makes a lot of these typesetting innovations possible. Apple utilises these typographical features not just in its browser, but in the development of its iBook reader – the ebook reader on iPad which is itself based on Webkit. Google also fuels these innovations for many reasons other than the browser – better typography in Google Docs being one of them. We can expect the momentum to build and it is possible to say with some confidence that the browser, together with CSS and Javascript is to become the most important typesetting engine of our time as it is fast becoming the typesetting mechanism for digital and paper books and the web.

## Ease and efficiencies

The implications for this are enormous and possibly not yet fully realised. At publishing industry conferences and other book-focused forums the attention has largely been on the ebooks effect on distribution, ereaders and the demise of the so-called brick-and-mortar book stores. The biggest effects however are elsewhere, ‘bubbling under’ in the recasting of the browser as a typesetting engine, and with it the slow realisation that the technical ecosystem surrounding book production can be replaced by tools for producing webpages. We are beginning to turn our attention to the tools for making webpages, to make books, and this, it turns out, is much easier than with Desktop Word Processing and Publishing software. Additionally due to recent developments, all of this, as it turns out, can also be used to design print (more on in-browser print production in a future post). Book production once again is becoming faster and cheaper and on its way to achieving another leap of magical efficency.

The future of book production right now is exploding all around us. These pieces of the puzzle are coming together and coming together fast. We can almost watch in real time the necessary mechanics get filled in by new release candidates of major browsers and searching online for ‘out of the blue’ small innovations such as Javascript typography controls. It is getting easier and easier to make books in the browser and consequently there has never been a time when it has been this easy to make books of all kinds. Ease of production is where it all started for Gutenberg and it is starting again for us. If you believe Gutenberg’s efficiencies changed society forever then what effect will the new new typesetting engines have? Its a giddy question. Making books in the browser will have an enormous impact on society as a whole, and just like the printing press, it will not revolutionise the old order, but create a new one.

This material is Creative Commons BY-SA 3.0 unported. Attributable to Adam Hyde, 2012.

## Ask the Ebook Experts: Text alignment in Fixed-Layout EPUB for iBooks

### How to mimic flowing text in a non-reflowable format

Q: In a traditional printed book, if a paragraph has not finished when the end of the page is reached, the entire paragraph will be justified. However the [CSS] command ‘text align last’ does not seem to be honoured in the last paragraph of the page in fixed layout for the iPad…What seems to happen is that in [InDesign CS6] it ‘looks’ justified but it doesn’t make it through to the epub version and there is a small gap at the end of the line. If you add text it goes on to a new line. I tried adding whitespace but that didn’t seem to be accepted…Is the problem with ibooks? Is there any workaround?

Fixed-layout paragraph in iBooks with incorrectly justified last line
Image: Andrew Rafferty; The Stones Remain

A: When you load a standard EPUB file into iBooks, the application automatically paginates the HTML content based on screen size and settings set by the user (font and font size). Content flows from page to page, and if a paragraph spans a page break, text alignment will be consistent on both pages.

Fixed-layout EPUBs differ from standard EPUBs in that it is the ebook designer who sets the pagination of the book, not the iBooks application. Each XHTML document in a fixed-layout EPUB file corresponds to a distinct page in the book, and no content is flowed from one page to the next.

If you want to mimic a text flow from page to page in a fixed-layout EPUB, you’ll need to split the text between two separate HTML documents. This poses a challenge if you want your text to be justified, because the text-align: justify CSS property does not stretch the final line of a paragraph to the full text-column width.

The good news is that CSS3 offers a solution to this very problem: the text-align-last property, which allows you to indicate how the final line of a text block is aligned. text-align-last: justify specifies that the final line should be fully justified, and span the full text column width.

The bad news about this good news is that text-align-last is not yet fully honored across all major Web browsers. It is supported in Mozilla-based browsers (Firefox), but is not supported in the Webkit engine, which powers Safari, Chrome, and—sadly—the iBooks ereader. Neither text-align-last nor the WebKit-specific -webkit-text-align-last, nor the EPUB3-specific -epub-text-align-last will produce the desired effect in the iBooks reader.

But some more good news for the intrepid and patient is there’s a hack-y HTML/CSS workaround that can achieve the effect of text-align-last: justify in iBooks (your mileage may vary on other ereader platforms).

## Tweak word spacing using CSS

The old-school (dating all the way back to CSS1) word-spacing property allows you to designate a specific amount of space to place in between words. The following example uses word-spacing: 7px to specify that the last seven words on the page should have seven pixels of whitespace between them:

<p>Everywhere there are mysteries. And the most ancient man-made wonders
of all are the stone monuments erected by our Neolithic and Early
bronze Age ancestors between 4000 and 1500BC - or, if it is less difficult
to visualize in this way, between 140 and 240 generations ago. Little
England (and smaller Scotland and Wales) are rich in these megalithic
structures. Archaeologists tell us that more than a thousand chambered
tombs and some 700 stone circles have resisted the
smoothing iron of wind and rain, the teeth of the plough, the
<span style="word-spacing: 7px">grasping hands of wave
upon wave of</span></p>

And here’s a screenshot illustrating how this text renders in iBooks.

Last line correctly justified in iBooks with word-spacing property
Image: Andrew Rafferty; The Stones Remain

The main benefit of this approach is that it gives you fine-grained control over the whitespace in a paragraph. The downside is that it can require a fair amount of trial and error to determine the proper word-spacing values to achieve the desired justification effect. If you do decide to use this method, and have a paid iTunes Connect ebooks account, I highly recommend using Apple’s Book Proofer tool, as it eliminates much of the hassle involved in syncing EPUB files between your computer and your iPad/iPhone/iPod.

O’Reilly’s Ebook Experts want to help you solve your tough digital-publishing problems. Send questions to ebook_experts@oreilly.com, and we will publish submissions and responses on the TOC blog in future editions of “Ask the Ebook Experts”

## Books should be as easy to create as websites

### Hugh McGuire on his new PressBooks publishing platform.

In this TOC podcast, PressBooks founder Hugh McGuire talks about the current state and future plans for this new book production platform PressBooks.

Comment: 1 |

## An Open, Webby, Book-Publishing Platform

This short article outlines some ideas about an open source, online platform for making books, based on WordPress.