• Print

TOC Recommended Reading

In Defense of Piracy (Lawrence Lessig, Wall Street Journal)

The return of this “remix” culture could drive extraordinary economic growth, if encouraged, and properly balanced. It could return our culture to a practice that has marked every culture in human history — save a few in the developed world for much of the 20th century — where many create as well as consume. And it could inspire a deeper, much more meaningful practice of learning for a generation that has no time to read a book, but spends scores of hours each week listening, or watching or creating, “media.”

Where is everybody? (Joe Wikert, TeleRead)

“If you build it, they will come” only works in the movies. If they really want to succeed Borders needs to do something beyond just making all this technology available in the store. Where are the in-store events (e.g., come let us help you research your family name, come see the latest e-book technologies, etc.)? How about signage in other areas of the store that promotes the tech kiosk area?

Mass book digitization: The deeper story of Google Books and the Open Content Alliance (Kalev Leetaru, First Monday)

Both projects offer the ability to search within a particular work, but only Google offers the ability to search across its entire collection. A search across the OCA archive only searches titles and description fields, not the full text of works. The OCA system thus offers a document-centric model, while Google offers both document and collection-based models, allowing broad exploratory searches of its entire holdings: the equivalent of being able to “full text search” a library. The importance of this difference cannot be understated in the limitations it places on the ability of patrons to interact with the OCA collections.

tags: , , , , , , , ,

Comments: 3

  1. the “first monday” article shows once again that
    academics still have their heads up their butts
    in regard to large-scale digitization projects.

    they still whine they don’t get “preservation”
    — hey, if google can’t afford it, nobody can —
    and still seem to believe “e-books = scan-sets”,
    which is quite simply ridiculous to the extreme.

    o.c.r. is mentioned in the piece only in passing,
    and is never discussed at all, while the author
    goes on and on about scan-resolution, the nature
    of the .pdfs created by both projects, and so on.

    ages ago, michael hart had the right take on it:
    “a picture of a book is not a book.” amen…

    in the time when “remixability” is being seen
    (correctly or not) as an important capability,
    the academics still don’t seem to realize that
    the combination of frozen page-images and a
    frozen file-format (.pdf) is absolutely deadly.

    digital text is far superior in _every_ way —
    more flexible and informative, less bandwidth
    — so the _real_ important question is this:
    “how do we correct o.c.r. on millions of books?”


  2. Although the First Monday article has appeared in a peer-reviewed journal, it does not appear to have been adequately fact-checked. It contains many errors that may lead readers to an inaccurate impression of the Internet Archive and Open Content Alliance contributors. I am a Berkeley anthropology PhD student doing field study at the Internet Archive. I study both the Internet Archive’s book project carefully and am aware of much of the public information about Google’s project. Whereas a piece to correct the errors is probably called for at some future date, I wanted to bring up just a few to illustrate that there are some problems with the piece.

    First, Leetaru doesn’t distinguish between the Open Content Alliance and the Internet Archive (they are not synonymous). The author notes that 100,000 books have been scanned by
    OCA; in fact, OCA has not scanned any books. The Internet Archive has scanned 400,000 (not 100,000, as Leetaru states) and hosts over 500,000, with the additional 100,000 drawn from the Million Books Project, the Gutenberg Project, and other donations by endusers. This number continues to grow every day.

    Second, he misstates a number of facts about policies and processes. He writes that the Internet Archive does not include any form of metadata in its downloadable files, but, in fact, they have been embedding precisely that metadata in all PDFs made from the books they scan since April 2008. He claims that there is fundamental confusion about user’s rights, but, now that the Archive’s agreement with Microsoft has expired, all restrictions have been lifted on MSN-sponsored books, including the very book he uses as an example (see his footnote 9). This book no longer carries restrictions. I cannot attend here to all of outdated facts and other misimpressions in the article, but there are many.

    Third, he faults the OCA for not making more of its backend processes publicly available at the same time that he doesn’t show due diligence in having properly sought it out–say, perhaps a phone call to Archive director Brewster Kahle himself, who was not consulted for the article. The Archive has built a variety of open source resources and tools, both hardware and software: open-source storage computers known as petaboxes; a publicly inspectable book scanner (the Scribe), as well as its website OpenLibrary.org, whose entire source code is available to the public. That Leetaru’s article didn’t even mention OpenLibrary can be taken as evidence that he didn’t do the legwork required to accurately represent the herculean efforts of the Internet Archive and the Open Content Alliance contributors to build an open, searchable digital public library.

  3. mary, if you’ll notice above, i said that leetaru
    has “his head up his butt”, so i think it’s clear
    that my job is not to defend him.

    but if you look at the “editorial history” of his
    article, at the bottom of it, you will find this:
    > Paper received 31 January 2008; revised 3
    > October 2008; accepted 6 October 2008.

    yes, one of the problems with peer-reviewed stuff
    is that the process is glacial… but given that
    the paper was first submitted in january of 2008,
    it takes the sting out of many of your charges…

    and the difference ‘tween o.c.a. and archive.org?
    well, i’m not sure that’s clear to anyone at all.
    and it only gets more murky with openlibrary.org.
    many of their lines are crossed, some very badly.

    and as for “due diligence” in “having sought out”
    information on “back-end processes” over there?
    well, i got a great big laugh out of that… :+)

    it’s nearly impossible to get _anyone_ to respond
    to questions about what they’re doing, and why…

    an open-access project must be better about that.

    i support _all_ of these big scanning projects,
    despite their flaws. but they are full of flaws.

    it’s just too bad that more people don’t seem to
    see which of those flaws are extremely troubling,
    and which ones are more or less merely cosmetic.