Comments: 11

  1. Great article. The American Society for Indexing,, (and sister societies in other countries. such as the Australian/New Zealand Society of Indexers) agree with the principles you set forth. To wit, ASI and ANZSI joined the International Digital Publishing Forum which manages the EPUB standard. An Indexes Working Group was established and a spec is nearing completion for member approval. The charter ( and spec (in draft currently establish “tagging” that allows for interactivity and integration of indexes in ePubs. Of course, reading systems and publishers have to cooperate by implementing the spec and providing indexes tagged according to the spec, respectively.
    David K. Ream, Leverage Technologies, IDPF IWG Co-Chair representing ASI

  2. Hugh, I can’t tell you how excited I was to read this.  You have hit the nail on the head, both as regards publishers providing APIs, and indexes as a fantastic place to start.  Proper coding of both text and index can provide a stunning array of “hooks and handles” upon which a publisher — or third-party developers — can hand all sorts of new, useful, and fun interactions between the human reader and the corpus being read.  As you say, “Add some semantic data, shake, and you’ve got yourself an API — or at least a map upon which to build your API.”  I hope lots of folks out there share your enthusiasm for this!

  3. Sorry — in my haste to cheer on your ideas, I neglected to properly identify myself!  I’m also on the IDPF working group Dave mentioned, as well as on the Board of Directors of the American Society of Indexing.  It’s energy and interest like yours that we hope will take our efforts and run with them, once the spec is finalized and approved.

  4. Good stuff, Hugh!

    Your example was something a computer could do (picking out
    John Smith) but you recognise, as most ebook publishers don’t, that indexes isolate
    not word occurrences but concepts. In doing that they reflect the way people actually
    read, apprehending meaning from sentences, paragraphs, however long an argument
    takes. To recognise that a section is about security, say, readers don’t depend
    on its containing the word ‘security’: search does.  

    OK, the ebook market started with airport novels where all you
    need is an electronic bookmark, but if usability is to be retained with academic
    ebooks, we need to think beyond word search, which (unlike a human indexer) won’t
    even collocate ‘American’ and ‘US’! But the key message I take from your piece
    is that ebooks should provide more usability, not less, and at present they’re offering
    less. Bravo!


    Bill Johncocks (Society of Indexers, Publishing Technology


  5. Maureen MacGlashan

    Let me add to the cheers! Sadly we will only see more if publishers and authors appreciate the value-added a true index offers over search and can be reassured that any apparent extra cost will be far outweighed by the benefits. The challenge, as Hugh says, is  “is making good index files of course, with good metadata, and good schemas”.

    Maureen MacGlashan (Society of Indexers Publishing Technology Group and editor of The Indexer)

  6. My take on this.

    I often want to be able to use materials, that I read, to build into my own model of the world.  Essentially using a document as a living thing, comparing it to what I already know and sometimes revising, or extending, my own thinking.    That’s not something to be left up to publishers but should be under my control.  To me that suggests  a common set of API’s, designed to suit people who use books.  If publishers provided those I could “use” their books in a uniform way, in pursuing my own work (nearly always not a 1 to 1 mapping against what’s in a book!).  They would, of course, be free to provide value added API’s.

    Some of this can be done today, fairly easily.  If the text is available it’s easy to build a basic index.  The big issue, for me, is that using words is rather crude.  Adding meaning through human annotation is a huge benefit, as you so ably point out in the article.  (In that regard it’ll usually be necessary to map “your terms” to “my terms”, if I have a pre-existing system, which can be a pain.)  Point is, that some of this is already feasible, can be done and maybe is being done (with some unwanted extra effort).

    An aside.  As in so many things the formats that becomes popular are not the best that have been.  There have  been better formats than ePub.  They seem to have gone dead.  One, that immediately springs to mind, had a built in full text index and was pretty compact.  Was easy to create.  That was CHM,  the “compiled HTML help” system.

    One issue is who does it, how and how do they get rewarded.  For example how do we encourage the guy who off his own bat builds the extra “just because that’s what he does”, to share.  How might he contribute his work to other readers and the publisher?

    • Your comment here reminded me remotely of how people are annotating maps with points of interest that only they might know about, as they are the local experts of a specific location. By annotating maps with additional information, this information becomes shareable with others which adds value.

      How about sharing book annotations?

      • Several years ago I had a document in one language that I wanted translated into another.  I’m not an expert in the second language though I can read it and get the gist.

        My solution was pretty simple.  I scanned the original (was allowed!) and put it up as a web page.  Two columns.  Left side original language broken down by paragraph.  Right side translation.  Provided link to potential translators.

        The whole thing didn’t take a lot of time and would be easier today, I’m sure than over a decade ago!

        So yes.  I think that’s a great idea.

        My main reservation is general open access.  The Endless September / flood of noobs factor has gotten a lot worse since those terms were first used.  Just look at comments to posts on wildly popular sites.  Hundreds of them and often after reading the first five you give up and decide not to bother next time.

        In other words popularity can be the enemy of intellect and usability.

        A mechanism to filter such inputs down to those that you want is a problem I haven’t seen well solved yet!

  7. Really enjoyed this article. Seems like the “smart index” idea would be especially useful for etextbooks and educational fare. Do you know of any great examples?

  8. Great article, and I am also very interested in the possibilities, which seem endless. Here’s a question though, Hugh– in your view, is an API something that will help sell “the book” and/or are they a potential source of revenue in themselves? In your article, I don’t see any discussion about revenue, sales, money, which are important to the survivability of a publisher… So what’s the ROI?

  9. FYI, the EPUB Indexes specification (aka, “API for ebooks”?) is now up for public comment. Please visit
    for more information, a link to the draft specification, and information on how
    to submit comments. The comment period goes through April 15, 2013

Leave a Reply

Your email address will not be published. Required fields are marked *