Why You Should Care About XML

Since we began talking about the StartWithXML project, a few offline comments have come in suggesting that imposing XML on authors (and editors for that matter) won’t work.

When framed that way, I’m in violent agreement. I would never argue that authors and editors should or will become fluent in XML or be expected to manually mark-up their content. I naively tried fighting that battle before, and was consistently defeated soundly. It is simply too much “extra” work that gets in the way of the writing process.

But there are several reasons why it’s really really important for publishers to start paying attention to XML right now, and across their entire workflow:

  • XML is here to stay, for the reasonably forseeable future. While it’s always dangerous to attempt to predict expiration dates on technology, I think it’s fair to assume XML will have a shelf life at least as long as ASCII, which has been with us for more than 40 years, and isn’t going anywhere soon.
  • Web publishing and print publishing are converging, and writing and production for print will be much more influenced by the Web than vice-versa. It will only get harder to succeed in publishing without putting the Web on par with (or ahead of) print as the primary target. The longer you wait to get that content into Web-friendly and re-usable XML, the worse.

Many in publishing balk at bringing XML “up the stack” to the production, editing, or even the authoring stage. And with good reason; XML isn’t really meant to be created or edited by hand (though a nice feature is that in a pinch it easily can be). There are two places to look for useful clues about how XML will actually fit into a publisher’s workflow: Web publishing and the “alpha geeks.”

Web Publishing

In the early days (mid ’90s), there were two primary ways that content got from a writer to the Web:

  1. Adventurous authors dove into HTML, learning the code needed to express lists and headings and tables. Most people relied on simple text editors, though HTML-specific tools like BBEdit began to emerge
  2. For many other writers, the workflow didn’t change much — articles were written with a word processor, then handed off to the production staff — in the case of the Web, for markup as HTML rather than for composition into print.

Today, the writers behind successful new media and content companies like the Huffington Post, PaidContent, TechCrunch, or Gawker depend on Web-friendly tools like blog platforms, RSS readers, and more recently dedicated writing software for bloggers (I’m writing this post with Mars Edit, though Ecto and Windows Live Writer are other popular choices). For most writers, most of the time, there’s little need to know more than minimal tagging (how to fix an errant hyperlink, for example). The substantial complexity of the XML at work is hidden. But no one will become the next Huffington Post accepting submissions as Word attachments. The tools will evolve, and there’s a real opportunity for publishers and writers willing to experiment on the edge, which brings me to the next place to look for clues about the future:

Alpha Geeks

By “alpha geeks” I mean those experimenting and innovating out on the edge, often doing it as much for the challenge and the learning value as for any specific payback. These early experiments can have a sizable impact on the direction of later effort and innovation. In the context of publishing, I’d say that much of what Harlequin has been doing lately qualifies, as does Bookworm, the Web-based EPUB reader project. Here at O’Reilly we believe in “eating our own dogfood,” and for a large chunk of our frontlist, books are either written directly in XML, or are converted to XML as the first step of production. That’s meant the ability to rapidly prototype new design elements and features, as well as to effectively separate design and content, and to achieve real “single source” publishing for many titles — simultaneously creating on-demand output for print, for Web-friendly PDFs, for ebooks, and for online-access via Safari Books Online. What we’re doing might not make sense for a lot of publishers today, but sitting on the sidelines waiting indefinitely for tools that don’t require new knowledge or skills doesn’t make much sense either.

I wouldn’t be surprised at all if publishers start seeing “manuscripts” in the form of a series of blog posts, or a set of Google Docs. In either case, that’s already Web-friendly XML, and if publishers want to spend their time and money pushing that it into Quark, then onto PDF, and finally on to a vendor to create an ebook, that’s their choice. But someone more nimble and willing to work natively in a Web-friendly format will be difficult to compete with.

Arguing over whether authors can/should/will “use XML” is not a debate I’m interested in having. Maybe they will, maybe they won’t. But XML is becoming part of the fabric of what will only become more digital and more networked content creation, production, and distribution, and continuing to treat it as just an output format for a vendor or developer to care about means missing substantial opportunities. And as books become more connected to the Web, the collaboration and communication made possible become powerful motivators. As Marc Andreessen said (quoted in The World is Flat) about the early Web and the technical hurdles it presented:

People will change their habits quickly when they have a strong reason to do so, and people have an innate urge to connect with other people. And when you give people a new way to connect with other people, they will punch through any technical barrier, they will learn new languages — people are wired to want to connect with other people and they find it objectionable not to be able to.

But what authors will (or won’t) do with XML doesn’t change the importance for publishers of understanding and applying XML to their workflow.

tags: , , , , ,