Recently by Laura Dawson

Some Tasty Bits from the StartWithXML UK Survey

We've got some raw results from the StartWithXML survey in the UK, and they are very different in some respects from the US survey we did. Some salient points:

  • 48.7% of the respondents were in the STM market, followed by trade (24.4%) and college (16%).
  • The bulk of respondents were from large houses - 50.4% - and the rest were evenly divided between midsized and small presses.
  • Nearly 55% of the respondents considered themselves "tech-proficient." As most of them were from production or management, this was not surprising. We did have a significant number of editorial respondents, however - 19.3%.
  • To 40.6% of our respondents, digital publishing is "very important - it informs all we do." Meanwhile, 59.4% of respondents are grappling with its impact in their companies. Only 17.8% of respondents say that they do not focus on the downstream uses of their book content, but on the print volume alone.
  • As far as expanded editions are concerned, 53.5% of publishers say they don't offer these. And 69.3% do not offer more than the basic ONIX marketing content (cover image, description, first chapter, table of contents) in their digital marketing efforts.
  • Over 73% of publishers do not have a formalized (formalised, if you're in the UK) DAM system.
  • And over 50% do not maintain files in an XML format.
  • Nearly 69% of respondents have problems retrieving files from storage, and have to institute workarounds. But over 56% look at XML as a way of complementing CMS and DAM tools they have already invested in.


CSS in an XML Workflow

At the StartWithXML Forum in New York in January, Rebecca Goldthwaite of Cengage gave a great demonstration of how Cengage uses CSS in their XML workflow. Many publishers regard style sheets as an invitation to create cookie-cutter book production, with the fear that all their books will look the same. This is emphatically a myth. Have a look at her seventh slide for examples of how one stylesheet can actually create many different looks.

CSS Zen Garden has been up for a while (Liza Daly used this model to create the EPUB Zen Garden a few months ago). It's a sort of CSS sandbox where graphic designers can play with style sheets and render the same content in very different forms. Clicking on the four links below will demonstrate what CSS can do:

It's well worth checking out and maybe having some graphic designers play around with it.

StartWithXML is Going to London

StartWithXML will be continuing in London! On September 2nd, at the British Library, we'll be conducting a one-day forum similar to the one we held in New York last January, but with a British publishing focus. Our sponsors for this event include Klopotek, MarkLogic, PLS, BIC, Publishers' Association, and of course O'Reilly.

We're still in the process of firming up our speakers, but we do have information posted here. Additionally, if you are a British publisher or service provider, there's a survey for you here.

As we get more news, we'll add it here - meanwhile, we're continuing to research and gather information about where publishers are in the StartWithXML process.

Taxonomies and Starting With XML

This is an excerpt from a blog post I wrote last week on taxonomies and chunking.

Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?

Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.

We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.

But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.

Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.

Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.

Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.

My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.

Coverage of StartWithXML

Turns out I was not the only one on Twitter for the StartwithXML Forum on January 13th. Joe Bachana was tweeting as well. Kind of interesting to see the posts side-by-side. David Rothman of Teleread also has some great things to say, as does Richard Curtis over at e-reads.

We also got nice coverage from PW, as well as Publishers Lunch.

Slides will be up soon!

A Correction!

Frank Grazioli, of Wiley, writes in to correct my last post about taxonomies:

Wiley has been exploring taxonomies for its travel content business; the cooking/psych/accounting spaces might be our next logical opportunities because the disciplines are well developed, specific, etc., that content is authored or edited in fairly controlled templates that map to our own XML content models and our belief in content models and XML has evolved that "lighter" and "more agile" are better than taggy and dense. As you so aptly point to the contextuality and "rigor" of taxonomies, these tools would allow our XML to "slip on the right jacket" for the occasion. I apologize if we led you to believe that we already have firm taxonomies in place for the three areas you specify--I wouldn't want readers/event guests to get that impression anyway.

Beyond the Tag Cloud

This is an excerpt from our research paper, which will publish in concert with the StartWithXML Forum on January 13th at the McGraw-Hill Auditorium in New York. Early bird discounting for BISG members is ending soon!

A good taxonomy is the backbone of your business -- it's how you sort your content. It allows for effective merchandising, effective marketing -- you can aim your content with the precision of a pool cue. It allows for inventorying your content -- so you know what you have ... and what you need. With your content tagged and organized, you know where everything is and how to deploy it.

Taxonomies are contextually sensitive and rigorous -- and in establishing your own, it helps to look at what other industries are doing. Wiley has adopted accounting and cooking and psychology taxonomies from those industries to organize information in its professional development titles. Educational publishers are increasingly arranging their textbooks around "learning objects" -- taxonomized pedagogical goals developed by educators themselves. Even the BISAC codes -- which are part of the ONIX system of organizing book information and therefore an XML-based taxonomy -- are developed very carefully and consensually among book industry professionals in monthly meetings.

An important aspect of taxonomy development is scope notes. Terms need definition and clarity around how they're going to be used. Documenting your taxonomy -- what you mean when you say "porcelain" (collectible china, dental work, household fixtures?), parent-child relationships between categories, and why you choose certain terms over others -- is important for the long term. Future editors and authors will need to know why your taxonomy has developed as it has.

Consistency in application is also crucial. Drop-down menus (as opposed to free-text fields) enforce structure and ensure that users don't come up with their own terms that pollute your taxonomy with duplicates or irrelevancies (or misspellings).

An advantage to using XML is that you don't have to accomplish everything at once, perfectly, from the outset. You will not be able to tag your documents thoroughly right off the bat -- who can know everything in advance? The act of tagging is recursive, and depends on market and company needs. XML allows for this flexibility. Depending on how you envision chunking and re-use, you'll tag your documents differently with each iteration. Unlike the "fire and forget" model, iterative tagging means that your books are living documents.

To Chunk or Not To Chunk?

This is excerpted from a column I wrote for the most recent issue of The Big Picture, my free newsletter about technology and the book industry.

As we're proceeding with Start With XML, I'm thinking a lot about chunking.

Chunking, at least as we're talking about it, means carving up your content into chunks and distributing those discrete pieces of it. Travel content (distributed over GPS, the web, and in book form) and recipes (distributed via Epicurious and AllRecipes.com as well as in book form) are the most obvious examples of this. Textbook publishing does this as well - certain assets can be used in the main text, in supplementary workbooks and lab manuals, as individual activities to be downloaded to an iPod, or embedded in e-books.

And as we talk about chunking, it's clear that there are certain types of content that don't immediately lend themselves to that kind of carved-up distribution. Novels, for example. Narrative nonfiction such as memoirs. Philosophical or political works, where tracing the author's thought from beginning to end is important.

The truth is, we may not quite know what will chunk readily and what will not. There are some blue-sky ideas right now - tagging content within narratives, to be pulled out later and stand on its own - but we just don't know yet if readers are interested in that kind of thing.

But publishers can't afford NOT to prepare for the unknown. There has never been uncertainty like this in publishing - uncertainty in stock prices and supply chain issues (paper prices, transportation/shipping costs, the costs of composition and conversion), uncertainty in revenue-generation, uncertainty as to who's going to buy what in which format - and it's not going to get any clearer for quite some time.

And you can't chunk at all if you haven't tagged - you can't even begin to think about chunking if you haven't tagged. Tagging is never a bad strategy - you will never regret doing it. But the risk of NOT doing it - the risk of not being ready for the next wave of consumer demand whatever that demand may be - means that you can't afford not to do it.

Standardizing Tags in the Metadata Minefield

One issue we haven't discussed much is that of metadata. XML documents are by definition rife with metadata. At what point does metadata cross the line from useful to pollution?

When it's not standardized.

The kind of XML tagging we're primarily talking about can be sectioned into three buckets: rights data ("this picture is good for print products but not electronic ones," "we can use this graphic anywhere," "these animations are exclusively for the workbook"), formatting data ("this is a chapter," "this is a footnote"), and context data ("Paris," "1955," "General Robert E. Lee," "noodles").

This is a perfect recipe for complete chaos. Obviously standards are crucial to the success of using XML in publishing. Even standards within a department -- using tags the same way from one project to the next, from one PERSON to the next -- are crucial.

There's been some talk about the role of the Book Industry Study Group in developing tagging standards, in the same way they've developed BISAC code standards. And this makes a great deal of sense. The rights and formatting tag standards should be relatively easy to establish -- publishing houses, no matter whether big or small, tend to use this data fairly consistently. It's the context tags that pose the more serious challenges.

Library of Congress has done this sort of thing with its subject headings. But, like the BISAC codes, these refer to the subject of an entire book. Many books, however, are comprised of more than one topic - many chapters are comprised of more than one topic. That level of granularity has never been taxonomized before.

Still, it's important to do so in a standardized way, to avoid a cacophony that drowns out meaning. (Is it "pasta" or "noodles"? When you say "diamond," are you talking about baseball or gemstones or Neil? Why is a chapter published by Mosby about dentistry coming up in search results with the chapters on collecting Limoges china published by Antique Trader? Hint: "porcelain.")

If you've ever seen a tag cloud on a website, you'll know what I mean. You never know what you're going to get when you click on it. Standardizing context tags is probably the most thankless, boring job publishers will ever engage in. But it's also the one that's going to ensure that books are actually discoverable the way they're meant to be discovered.

What We Talk About When We Talk About XML (Apologies to Raymond Carver)

Acronyms and initialisms are mysterious and potent, and frequently hide meaning and become shorthand for larger concepts. Just as ONIX became shorthand for "metadata,, XML (at least in book publishing land) is becoming shorthand for ... well, a lot of things. Repurposing content, creating templates for book design, tagging -- all of these are encompassed in the term "XML workflow."

So no wonder people get confused. Particularly people who are in the business of creating content, not formatting, categorizing, packaging and marketing it.

So what are we talking about when we're throwing around this term? It depends on what you do for a living.

If you're a writer, it might mean using Word a little differently, quite possibly according to specific author guidelines given to you by the publisher. It might also mean including lists of keywords along with your manuscript. It may mean including lists of keywords for each chapter.

If you're an acquisitions editor, an XML workflow may mean deciding whether you want a book to merely exist as a print product (as a single source of revenue), or whether it's also appropriate as an ebook, to sell by the chapter (as numerous textbook publishers are doing), to publish iteratively (as O'Reilly does with its Rough Cuts), to make excerpts available for free download, etc.

If you're a book production editor, an XML workflow will be very concrete -- you tag a manuscript according to its format ("chapter heading," "illustration," "copyright page"), and those tags are applied to a pre-defined style sheet.

If you're in marketing, an XML workflow allows you to work with the author's keywords, target specific audiences for the content, and package the content in appealing ways.

Could you do all of this without XML? Sure. You could use a relational database and shove your manuscript, chapter by chapter, into tables in SQL. You could assign keywords in a relational database. But you couldn't do formatting. You could use InDesign or Quark to do your formatting. But you couldn't break up your manuscript into "chunks" and repackage those "chunks" into new products with those programs. XML has the capacity to handle both, and handle them well.

Like most acronyms, XML is a tool. It's not a goal in itself, but a way to get to your goal.

Chunks and Verticals and Niches -- Oh, My!

Despite the bell tolling on the publishing industry lately, the publishers who are doing well these days are those who have focus. Publishers who have a consistent message, who create content about specific things, seem not to be paddling the lifeboat with broad, generalized trade publishers. Niches, areas of concentration -- call them what you will, but this is where the future of publishing lies. Just as cable TV brought about a revolution in video consumption -- movies on this channel, comedy on this other one, news on this third one -- digital distribution has brought about a revolution in publishing. It's just a question of understanding where the ground is moving under your feet.

Digital tools -- such as e-books, book trailers, widgets, what have you -- are just that: tools. They are no substitution for product -- nor will they sell a product that doesn't deserve to be sold. Funneling money into "digital initiatives" is wasting money -- unless those initiatives are clearly defined.

How to define them? How to read the tea leaves and figure out what initiatives actually make sense and which are a money pit?

The StartWithXML team has looked at XML itself for guidance. XML tools allow people -- editors, authors, production teams -- to "componentize," to break content down into irreducible parts, to re-use those parts, to publish content more than once. If your book content is sufficiently tagged, you can re-use it early and often.

I think about one of my favorite authors, Wayne Dyer. He writes his books. From those books are generated calendars, one-a-day cards, daily journals, audiobooks, supplementary materials (such as meditations). If Hay House felt like it, they could send an email containing an inspirational quote to my inbox every morning. Dyer writes once. But Hay House publishes his stuff many times over, in many different formats. If he feels like doing more, they provide him with a platform for podcasts, conferences, interviews, and opportunities to preface or foreword other Hay House authors' books.

Doing these sorts of tricks -- and creating loads of interesting and compelling products almost as byproducts of your original content -- is much easier and cost-effective if you're already using XML. The "chunks" of content are pre-defined. You don't have to make iterative runs at the original manuscript and figure out what can be re-used; you know from the get-go what you WILL re-use.

This is not anti-literary. It's pro-keeping-your-publishing-house-in-business. And the sooner trade houses realize what their verticals actually are, and pursue them with the savage focus that the niche publishers do, the sooner everyone is happy: the consumer, who gets loads of content; the author, who gets loads of royalties; and the publisher, who is squeezing every last penny out of each word the author writes.

Further Thoughts on Amazon/BookSurge

I keep turning the Amazon/BookSurge story over in my mind, and decided today that it was worth a deep look at the stakeholders, and their stakes:

Print-on-Demand (POD) publishers: These include self-published authors as well as publishers with tens of thousands of titles. The POD model is cost-effective for many types of publishing. POD publishers choose their printers based on cost, quality of the product they turn out, and other differentiating terms.

Amazon: Not only do they sell books, but through their POD service BookSurge, they print them. They have just mandated to POD publishers that they will only sell POD titles printed by BookSurge. If POD publishers want to sell non-BookSurge books, they need to pre-print copies and send them to Amazon's warehouse. This takes the "OD" out of POD, and defeats the business model entirely.

What Amazon has done, essentially, is force the POD publishers to choose between BookSurge, its proprietary service with a proprietary format, and Lightning Source/Lulu/etc. Amazon says that it hasn't actually forced a choice -- that publishers are free to use whatever service they like and then pre-print five copies to be stored in Amazon's warehouses. For a small publisher, that's pretty demanding, and the fact is, most POD publishers use on-demand technology because it's cheaper. By pre-printing titles for sale, small publishers get into the same business that mainstream publishers do -- the business of pulping and remaindering books that don't sell.

Ingram: Not only do they distribute books, but their POD service Lightning Source prints them. Lightning Source is the largest POD service in the country. They also ship titles for Amazon that aren't carried by other sources. Ingram on Monday said that none of this actually had any impact on Lightning Source, but that's not the case. Amazon has said they will not send true POD orders to Lightning Source. The only way they will sell Lightning Source titles is if they already have those books in the warehouse.

Barnes & Noble: Like Amazon, of course, they sell books. They also have a very close relationship with Ingram. They will not sell any books printed by BookSurge. It may be that Ingram doesn't do anything directly to Amazon in retaliation for this loss of business, but they might suddenly get even closer to Barnes & Noble than they already are -- maybe team for an exclusive digital initiative.

Baker & Taylor: Like Ingram, they distribute books. They are the preferred vendor for Amazon, though Amazon will go to Ingram if B&T doesn't carry the book in question. One issue is, will the non-BookSurge books reside in B&T warehouses? Could B&T find itself carrying Lightning Source books because Amazon will only deal with those as they deal with books from non-POD publishers?

Borders: In trouble. Relaunching a website, purportedly on May 3. This is where Borders could get smart and play Switzerland. They could serve as a POD portal, selling all POD titles regardless of affiliation. But at this point it's asking a lot for Borders to keep its head above water, much less strategize at this long-tail level. So unless they've suddenly acquired some major talent in several divisions of their Web site, that's rather a long shot.

The Customer: He just wants a book -- his mom's book, a how-to book, a book on an obscure topic. He wants a book that may only be available on-demand. How does he know where to find it? Why must he go to Amazon, and then B&N, to see if he can get it? Amazon's not going to sell Lightning Source titles; B&N's not going to sell BookSurge titles. But the customer doesn't know or care who the printer is. He just wants to be able to find and buy the book.

That all said ... it's worth looking at both Lightning Source and BookSurge with a microscope.

Ingram has done some interesting things with regard to book distribution in the last few years. First, they created Ingram Digital Group (IDG), which is charged with exploring content delivery initiatives to both retail outlets and libraries. Second, they acquired VitalSource, an ebook platform heavily in use by college textbook publishers. Third, they acquired Coutts, whose MyiLibrary product is also an ebook platform for the academic and institutional market.

So Ingram has these delivery systems for large amounts of content, ready to package as white-label platforms with subscriptions included. With the digital files that Lightning Source uses, Ingram could conceivably distribute the current POD offerings as ebooks as well, distributed across its Coutts or VitalSource platforms.

Meanwhile, Amazon is scanning books for its "Look Inside" program. Obviously, it has a platform capable of delivering ebooks -- it does so daily. If Amazon insists that its POD publishers stick with BookSurge, and treats other POD vendors as traditional book publishers, then it could conceivably build a feeder system like IDG has with Lightning Source: from BookSurge into the "Look Inside" platform and into ebooks.

And if Ingram's longtime alliance with B&N (10 years ago, B&N attempted to buy Ingram) is threatening Amazon -- if the mere potential of IDG allying with B&N.com to distribute Lightning Source/VitalSource/MyiLibrary content as ebooks is at all threatening to Amazon -- then surely Amazon's sudden focus on BookSurge is a natural one.

Amazon Pushes Print-On-Demand Exclusivity

The intertubes have been flapping today about Amazon's latest move to get its POD publishers and self-published authors to exclusively use BookSurge for printing their titles. The Wall Street Journal has the MSM story here (subscription required):

"It's a strategic decision," said Tammy Hovey, a spokeswoman for Amazon. What we're looking to do is have a print-on-demand business that better serves our customers and authors. When we work with some other publishers, it's not truly a print-on-demand business."

Ms. Hovey, who said Amazon began to inform publishers of the new policy at the end of February, declined to provide specifics. She said she doesn't consider the move an ultimatum.

However, one POD publisher, Angela Hoy, disagrees. In a discussion with her Amazon rep, John Clifford, she says,

Contrary to what he stated at the very beginning of our conversation, Mr. Clifford finally admitted that books not converted to BookSurge would have the "buy" button turned off on Amazon.com, just as we'd heard from several other POD publishers who had similar conversations with Amazon/BookSurge representatives.

And if a POD publisher has a significant list (Hoy's is 1,500 titles) already with Lightning Source?

Since Amazon/BookSurge does not offer Ingram distribution (Ingram distribution is considered imperative in the industry for bookstore sales), any company that accepts the Amazon/BookSurge deal, who desires to keep offering Ingram distribution, may need to maintain two copies of the book files. Since the Amazon/BookSurge current specs don't match the Lightning Source specs, future book files, both interior and cover, may need to be formatted separately. So, they would have to pay double the setup fees and might have to do double the formatting work as well...or pay designers to do double the formatting work.

In addition to placing an undue burden on smaller authors and publishers, from an overall industry perspective what does this do to Amazon's relationship with Ingram? Already, Ingram does drop-shipping for titles that Amazon cannot supply via Baker & Taylor. Amazon's own book inventory isn't that steep, certainly.

Stay Connected
RSS TOC RSS Feeds
 News Posts
 Commentary Posts
 Combined Feed
 New to RSS?
Newsletter Subscribe to the TOC newsletter.
Tarsier Icon Follow TOC on Twitter.
Newsletter Join the TOC Facebook group.
Newsletter Join the TOC LinkedIn group.
TOC Widget Get the TOC Headline Widget.
Search
TOC In-Depth

Impact of P2P and Free Distribution on Book Sales Impact of P2P and Free Distribution on Book Sales

This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.


StartWithXML: Making the Case for Applying XML to a Publishing Workflow StartWithXML Research Report

The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.


Tools of Change for Publishing tutorial DVDs TOC 2008 Tutorial DVDs

Dive into the skills and tools critical to the future of publishing. Learn more.

Tag Cloud
TOC Community Topics