Standardizing Tags in the Metadata Minefield

One issue we haven’t discussed much is that of metadata. XML documents are by definition rife with metadata. At what point does metadata cross the line from useful to pollution?

When it’s not standardized.

The kind of XML tagging we’re primarily talking about can be sectioned into three buckets: rights data (“this picture is good for print products but not electronic ones,” “we can use this graphic anywhere,” “these animations are exclusively for the workbook”), formatting data (“this is a chapter,” “this is a footnote”), and context data (“Paris,” “1955,” “General Robert E. Lee,” “noodles”).

This is a perfect recipe for complete chaos. Obviously standards are crucial to the success of using XML in publishing. Even standards within a department — using tags the same way from one project to the next, from one PERSON to the next — are crucial.

There’s been some talk about the role of the Book Industry Study Group in developing tagging standards, in the same way they’ve developed BISAC code standards. And this makes a great deal of sense. The rights and formatting tag standards should be relatively easy to establish — publishing houses, no matter whether big or small, tend to use this data fairly consistently. It’s the context tags that pose the more serious challenges.

Library of Congress has done this sort of thing with its subject headings. But, like the BISAC codes, these refer to the subject of an entire book. Many books, however, are comprised of more than one topic – many chapters are comprised of more than one topic. That level of granularity has never been taxonomized before.

Still, it’s important to do so in a standardized way, to avoid a cacophony that drowns out meaning. (Is it “pasta” or “noodles”? When you say “diamond,” are you talking about baseball or gemstones or Neil? Why is a chapter published by Mosby about dentistry coming up in search results with the chapters on collecting Limoges china published by Antique Trader? Hint: “porcelain.”)

If you’ve ever seen a tag cloud on a website, you’ll know what I mean. You never know what you’re going to get when you click on it. Standardizing context tags is probably the most thankless, boring job publishers will ever engage in. But it’s also the one that’s going to ensure that books are actually discoverable the way they’re meant to be discovered.

Popular topics:

TOC

Stay Connected

More O'Reilly Sites