• Print

To Chunk or Not To Chunk?

This is excerpted from a column I wrote for the most recent issue of The Big Picture, my free newsletter about technology and the book industry.

As we’re proceeding with Start With XML, I’m thinking a lot about chunking.

Chunking, at least as we’re talking about it, means carving up your content into chunks and distributing those discrete pieces of it. Travel content (distributed over GPS, the web, and in book form) and recipes (distributed via Epicurious and AllRecipes.com as well as in book form) are the most obvious examples of this. Textbook publishing does this as well – certain assets can be used in the main text, in supplementary workbooks and lab manuals, as individual activities to be downloaded to an iPod, or embedded in e-books.

And as we talk about chunking, it’s clear that there are certain types of content that don’t immediately lend themselves to that kind of carved-up distribution. Novels, for example. Narrative nonfiction such as memoirs. Philosophical or political works, where tracing the author’s thought from beginning to end is important.

The truth is, we may not quite know what will chunk readily and what will not. There are some blue-sky ideas right now – tagging content within narratives, to be pulled out later and stand on its own – but we just don’t know yet if readers are interested in that kind of thing.

But publishers can’t afford NOT to prepare for the unknown. There has never been uncertainty like this in publishing – uncertainty in stock prices and supply chain issues (paper prices, transportation/shipping costs, the costs of composition and conversion), uncertainty in revenue-generation, uncertainty as to who’s going to buy what in which format – and it’s not going to get any clearer for quite some time.

And you can’t chunk at all if you haven’t tagged – you can’t even begin to think about chunking if you haven’t tagged. Tagging is never a bad strategy – you will never regret doing it. But the risk of NOT doing it – the risk of not being ready for the next wave of consumer demand whatever that demand may be – means that you can’t afford not to do it.

tags: , , , , ,

Comments: 4

  1. except if you don’t know what chunks you want,
    you probably won’t apply the tagging correctly…

    which means you’ll have to redo it later anyway.

    which means the cost of doing it the first time
    is money down the drain.

    and even if you do the tagging “correctly”,
    if you don’t actually _use_ those “chunks”,
    then the time and energy spent to tag them
    is also money down the drain.

    you need to think very carefully about the
    value proposition when someone tells you
    that you “can’t afford not to” do something.


  2. We have been wrestling with this issue for months now. As well as learning the XML ropes and how to implement and use XML workflow in InDesign and online environments.

    Three observations come to mind:

    (1) Better to make our best guess on the right tags and the right chunks to tag now than dither, wondering if we will wish we’d done it differently sometime in the unspecified future.

    (2) If we later decide we should have used a different set of tags, we can always use a global search and replace to correct them.

    (3) With a bit of brainstorming and analysis (without overdoing either so as not to paralyze would-be progress), we can develop a concise set of tags and clear goals for our data chunking. We can add more of either later by searching the text for keywords.

    Better a good decision today than an excellent one later. Of course, we are a micro-publisher and, as such, we all know the entirety of each project.

  3. @Walt

    I fully agree with your comment.

    Years ago, (SGML times, not yet XML) as I was in charge of structuring dictionaries, we made best guesses on future uses and began tagging some entries without an immediate use, for instance to identify biographical data (birth, death years), heights, …

    The effort was in fact kept reasonable, and done incrementally during our yearly updates.

    This tagging helped first for quality control at the editorial level and, eventually, on the third year, allowed us to complete a consistent structure and release new search functions on CD-ROMS, which later were used as the basis for online uses.

    As you say, it’s easier to do that with a small team, where everybody understands the issues and the goals.

  4. well, doing your best is always a good direction.

    and i guess y’all won’t need this $600 conference
    because you’ve got a system down pat already. :+)

    but please, folks, track your costs and benefits,
    so you can see later if you took the right route.

    otherwise, once you pour a lot of money into it,
    you’ll be motivated to think it was “worthwhile”,
    even if the bottom-line doesn’t reflect it later.


    p.s. doing a “global search and replace” on tags
    doesn’t really change anything about those tags,
    other than the names being used to describe ’em.
    the type of mis-tagging that typically occurs is
    that you utterly failed to tag some dimension
    because you didn’t realize, at the time, that
    it would be an important dimension. kinda like
    the government didn’t know the sub-prime mess
    was an important dimension on which to regulate.
    the unanticipated is unknown _by_definition_…