Recommended Reading on XML and Publishing

While clearing out some old files, I came across a folder of articles culled during research about three years ago, while I was building the case for increasing our use of XML for book production. If you’re looking to take a break from the steady stream of terrifying financial news, here’s a few hours of time well-spent on angle brackets. Much of this skews fairly technical (including actual math), but there’s some useful context to an XML conversation:

  • When Word-to-XML conversions get nasty from Mike Gross at Data Conversion Laboratory. “Before you begin a conversion, look through your source Word documents to see how well they were formatted but be prepared you may be horrified with what you find.”

  • From the Journal of Digital information, a paper by Terje Hillesund, Many Outputs — Many Inputs: XML for Publishers and E-book Designers. Terje takes a contrarian view on XML, though specifically calls out what many trade publishers primarily deal with as well-suited for XML: “For many typographically simple genres, like most present fiction, reuse has already proved to be relatively easy … In the future, XML-based workflows will make re-use of many fiction genres even easier, as these visually and navigationally uncomplicated texts can be made into a variety of paper and electronic editions from the same XML document by use of style sheets …”

  • The response to Hillesund from XML guru Norm Walsh, XML: One Input — Many Outputs: A response to Hillesund. “Before considering the flaws in each of [his] arguments, it is interesting, if slightly incongruous to his arguments, to note that Hillesund’s paper includes no less than four examples of the successful use of XML precisely for the publication of multiple output formats from a single input document. “

  • A fascinating paper from 1998, On the Pagination of Complex Documents, which discusses the challenges inherent with automated pagination of the kind found in many XML-based rendering systems (as well as older systems such as LaTeX). “Using competitive analysis we show that, under realistic assumptions, not only first-fit but any online pagination algorithm may produce results that are arbitrarily worse than necessary. This explains why so many people are not satisfied with paginations produced by LaTeX if no manual improvement is done”

  • It hasn’t been updated since 2005, but Choosing an XML editor, from Thijs van den Broek offers a nice survey of XML editors. “The study consisted of a literature search, surveys to identify user needs, current usage, existing editors, and (existing and desired) features of editors, as well as an evaluation exercise.”

  • Here at O’Reilly our workflow is centered around DocBook XML, but DITA (Darwin Information Typing Architecure) is a more recent XML vocabulary, also designed primarily for technical information. IBM developerWorks has a nice overview, Introduction to the Darwin Information Typing Architecture. “This document is a roadmap for the Darwin Information Typing Architecture: what it is and how it applies to technical documentation. It is also a product of the architecture, having been written entirely in XML and produced using the principles described here.”

  • Written from the perspective of a technical documentation group at Cisco, Low-Cost, Flat-File XML for the Masses is an interesting case study from a team committed to finding a way to use XML that was both better for writers and didn’t require a large investment in new software: “You can realize the benefits of publishing from modularized XML, without the expense of an enterprise publishing system, by implementing the authoring environment on top of nothing more than your operating system’s file system. Although this environment is not adequate for enterprise publishing needs, it is more than adequate for the needs small writing teams, businesses with a limited number of related products, proof-of-concept demonstrations, and even home users.”

tags: , ,