Hugh McGuire‘s startup BookOven has opened up an alpha version of a project they’re calling the Gutenberg Rally, an attempt to harness collective intelligence Mechanical-Turk style to proofread Project Gutenberg texts for typos and OCR (Optical Character Recognition) errors. In “divide and conquer” style, the system presents just one small snippet of text at a time (with some surrounding context), effectively breaking down a mountain of a task into easily managed molehills:
I had a nice chat with Hugh on Wednesday morning, and what he told me about what’s to come from BookOven was quite exciting (though apparently still very much in development).
This isn’t the first attempt to harness eyeballs for finding and fixing OCR errors (see ReCaptcha), but reviewing the text in context is a much more satisfying experience, and left me wanting to read more of several of the books I was seeing only in snippet form.