Tools

Qwitter: Accessible Twitter client (uses TTS) (via @doctorow)

Just make sure not to follow anyone who's a member of the Author's Guild ...

"The Qwitter client enables blind individuals to interface with the Twitter service globally, regardless of application focus. Based off of revolutionary concepts pioneered in The Jawter Jaws Scripts, Qwitter, with full support for the three major comercial screen readers and sapi speech, provides you instant access to all aspects of the twitter microblogging service, giving you the ability to post a tweet from anywhere, read tweets, perform searches, and far, far more."

http://www.qwitter-client.net/

Posted via email from TOC Posterous

"Web-based ePub validator adds Preflight and API" (via @liza)

From @liza at Threepress:

"EpubCheck’s lesser-known companion checks for additional issues like content documents that exceed 300K, which can’t be loaded on the Sony Reader."

http://blog.threepress.org/2009/11/04/epub-validator-updates/

(ps -- thanks to @liza for making my day with the pointer to http://twitter.com/big_ben_clock)

Posted via email from TOC Posterous

BookServer: A Web of Books

I'm thrilled to be at the Internet Archive's "Making Books Apparent" event today in San Francisco, where they're debuting the new BookServer architecture.

As the audience for digital books grows, we can evolve from an environment of single devices connected to single sources into a distributed system where readers can find books from sources across the Web to read on whatever device they have. Publishers are creating digital versions of their popular books, and the library community is creating digital archives of their printed collections. BookServer is an open system to find, buy, or borrow these books, just like we use an open system to find Web sites.

We were early participants in the conversations that led to today's launch, and look forward to seeing this standard gain further adoption to support a rich digital book discovery, lending, and commerce marketplace on the Web (especially the mobile web).

Follow the action on Twitter.

Second "Open Feedback" Title Now Online

Over on the O'Reilly Labs blog, Keith Fahlgren talks about the latest title to go live in our Open Feedback Publishing System, which gives authors and readers a way to discuss a book while it's being written. The latest book, Building iPhone Apps with HTML, CSS, and JavaScript, also features a very nice upgrade to the system's CSS (its look-and-feel).

iPhone book in OFPS

Keith also offers up a nice post-mortem on the first book to go through the system, Programming Scala, where "over the months, nearly 100 people left a total of 543 comments. Ten contributors stood out in particular, giving more than a third of the total comments."

Mobile as New Medium

While prepping for my talk tomorrow on mobile publishing at the Digital Publishing Group in New York, I was also popping in and out of a related ongoing email conversation about textbooks and iPhones, and couldn't help but weigh in on the question of how to handle some the issues like cross referencing and annotations on the iPhone compared with in a textbook. Several people suggested the comments were worth sharing with a larger audience:

These are relatively minor technical problems that generally already have solutions. The bigger issue I see is that thinking of the problem as "how do we get a textbook onto an iPhone" is framing it wrong. The challenge is "how do we use a medium that already shares 3 of our 5 senses -- eyes, ears, and a mouth -- along with geolocation, color video, and a nearly-always-on Web connection to accomplish the 'job' of educating a student." That's a much more interesting problem to me than "how do we port 2-page book layouts to a small screen."

Mobile is big on the agenda at TOC Frankfurt, TOC New York, and I'm sure will come up during the upcoming TOC online event.


New on O'Reilly Labs: Open Feedback Publishing System

O'Reilly engineer Keith Fahlgren has formally launched our new Open Feedback Publishing System over on O'Reilly Labs:

Over the last few years, traditional publishing has been moving closer to the web and learning a lot of lessons from blogs and wikis, in particular. Today we're happy to announce another small step in that direction: our first manuscript (Programming Scala) is now available for public reading and feedback as part of our Open Feedback Publishing System. The idea is simple: improve in-progress books by engaging the community in a collaborative dialog with the authors out in the open. To do this, we followed the model of the Django Book, Real World Haskell, and Mercurial: The Definitive Guide (among others) and built a system to regularly publish the whole manuscript online as HTML with a comment box under every paragraph, sidebar, figure, and table.

You can see the system in action at the site for our upcoming book Programming Scala.

Authoring Tools from Alpha Geeks

Cory Doctorow (@doctorow) has posted a nice article covering some of the tools he's built or borrowed to make his writing life more manageable. I'm especially intrigued by the Flashbake project, which augments simple use of version control (something many of our authors have been using for years, and which we use extensively in our production toolchain) to automatically capture contemporaneous data about the writing process:

Now, this may be of use to some notional scholar who wants to study my work in a hundred years, but I'm more interested in the immediate uses I'll be able to put it to — for example, summarizing all the typos I've caught and corrected between printings of my books. Flashbake also means that I'm extremely backed up (Git is designed to replicate its database to other servers, in order to allow multiple programmers to work on the same file). And more importantly, I'm keen to see what insights this brings to light for me about my own process. I know that there are days when the prose really flows, and there are days when I have to squeeze out each word. What I don't know is what external factors may bear on this.

Thinking about content like code opens up a wealth of tools and techniques for managing that content. After all, programmers spend more time than just about anyone doing what can very easily be called "creative writing" with text, so it's no surprise they've built tools to make their lives easier and more productive. We're getting ready to announce a new project over at O'Reilly Labs, one also built on top of version control (Subversion in our case) and another example of using software tools to improve the writing (and in this case reading) experience.

Open Publishing Distribution System -- an Open-Standards Catalog Format

It's no secret we're big fans of the iPhone/iPod reading app Stanza. While the Kindle App has overtaken Stanza for the top-spot among free book apps in iTunes, Stanza offers a much better reading experience than the Kindle App (for example, by supporting standard formatting like tables and whitespace-preservation) (Update: You can use the latest version of mobigen.exe to get better whitespace-preservation (from <pre> and friends) on the Kindle.) And I'm not the only one who feels that way: "Stanza is hands-down the best e-book reader for the iPhone and iPod Touch, and its free. Go. Get it now." (Wired.com).

But more than the quality of the software, the major reason I'm so bullish on Stanza is their willingness to experiment. When our own Keith Fahlgren suggested they use the standard Atom format for their catalog system, they responded:

We wound up taking your advice and implemented support for Atom for Stanza's catalog format. Thanks for the suggestion! Using the Atom standard is much better than using our own custom format (although we may need to eventually extend the custom format with our own tags).

And when we proposed using Stanza to create a standalone book app (for iPhone: The Missing Manual), they were eager to dive in head first, and we both learned a lot in the process.

That Atom-based Online Catalog feature turned out to be an interesting prototype for a distributed digital discovery and ecommerce system, and it's awesome to see them willing to embrace the potential for such a system well beyond the boundaries their own product, and to join with Peter Brantley and the Internet Archive in laying the groundwork for what's being called the Open Publication Distribution System:

Users of compatible Reading Systems, in addition to being able to access content they have previously acquired or acquire via other means, are also able to access a catalog (list of online sources of content). Typically, the catalog offers a number of free titles, which may be hosted by the Reading System vendor and/or other sites, as well as the opportunity to purchase or borrow paid content from stores and libraries. Additional stores and libraries may be added by the user to their personal catalog. The mechanism through which compatible Reading Systems access the distributed catalog has three components: eBook content, XML catalog metadata, and an HTTP transport for the catalog. The remainder of this document will discuss each of those components in turn.

One of the reasons we've thrown our support behind the Bookworm online ebook reading system as part of O'Reilly Labs is to help support the development and testing of new standards like this one, and we're excited to contribute to this new initiative. It's also great to see Adobe support this as well, and is a nice follow on to our work with them on EPUB output for the open-source DocBook XSL stylesheets.  

"Bite-Size Edits" from BookOven

Hugh McGuire's startup BookOven has opened up an alpha version of a project they're calling the Gutenberg Rally, an attempt to harness collective intelligence Mechanical-Turk style to proofread Project Gutenberg texts for typos and OCR (Optical Character Recognition) errors. In "divide and conquer" style, the system presents just one small snippet of text at a time (with some surrounding context), effectively breaking down a mountain of a task into easily managed molehills:


BookOven Gutenberg

I had a nice chat with Hugh on Wednesday morning, and what he told me about what's to come from BookOven was quite exciting (though apparently still very much in development).

This isn't the first attempt to harness eyeballs for finding and fixing OCR errors (see ReCaptcha), but reviewing the text in context is a much more satisfying experience, and left me wanting to read more of several of the books I was seeing only in snippet form.

Software Development as Collaborative Writing

Following a lively backchannel email discussion, I'd planned to blog about what writers, editors, and publishers can learn from software developers (specifically their tools and techniques) but Tim beat me to it over on the Radar blog.

As I said in my email, The more I think about it the more obvious it's becoming to me that the next generation of authoring/production tools will have much more in common with today's software development tools than with today's word processors.

Software developers spend enormous amounts of time creatively writing with text, editing, revising, refining multiple interconnected textual works -- and often doing so in a highly distributed way with many collaborators. Few writers or editors spend as much time as developers with text, and it only makes sense to apply the lessons developers have learned about managing collaborative writing and editing projects at scale.

Programmers faced with annoying problems like "how do I make sure that changes I make to this text don't conflict with someone else's changes" or "how do I tell who among several writers made a particular change to some text" solved those problems long ago (Wikis are a great example of applying some of those tools and techniques to the writing process; API-based offline blogging editors are another).

And while using those tools as-is probably won't make sense for a lot of non-technical writers, those willing to at least try them out will learn a lot about what the next generation of collaborative, distributed, digital publishing tools will look like.

Jakob Nielsen: Kindle Content Must be Kindle-Specific

Jakob Nielsen offers an in-depth look at Kindle formatting best practices:

For Kindle, it's certainly unacceptable to simply repurpose print content. But you can't repurpose website content, either. For good Kindle usability, you have to design for the Kindle. Write Kindle-specific headlines and create Kindle-specific article structures. [Link included in original post.]

(Via Joe Wikert's Twitter stream)

Taxonomies and Starting With XML

This is an excerpt from a blog post I wrote last week on taxonomies and chunking.

Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?

Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.

We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.

But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.

Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.

Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.

Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.

My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.

Virginia Open Sourcing Physics Textbook ("Flexbook")

I was part of a brief Twitter exchange recently with Cengage's Ken Brooks about the cost of textbooks:

kenbrooks: @doctorow #toc That depends entirely on the type of book. A K-12 reading program costs $millions.

andrewsavikas: @kenbrooks not necessarily. See ck12.org

kenbrooks: @andrewsavikas Talk to McGraw Hill or Pearson about basal reading programs. The intricacies are staggering. #toc

I like Ken a lot personally (and respect him a ton professionally), and I have no reason to doubt that it does take millions to develop many educational programs. But my reference to ck12.org (whose founder, Neeru Khosla, keynoted at TOC 2008) was because if it does cost that much, then something's wrong with the system, and that's not likely to change without the work of groups like ck12.

In fact, Virgina is already in the process of developing an open-source "flexbook" for physics using the ck12 platform:

Secretary of Technology Aneesh Chopra and Secretary of Education Tom Morris today announced the selection of thirteen individuals to form a core team to pilot the development and release of an open–source physics "flexbook" for Virginia. This electronic material will focus on high school physics and contain contemporary and emerging 21st century physics and modern laboratory experiments.

The Virginia Physics "Flexbook" project is a collaborative effort of the Secretaries of Education and Technology and the Department of Education that seeks to elevate the quality of physics instruction across the Commonwealth by allowing educators to create and compile supplemental materials relating to 21st century physics in an open–source format that can be used to strengthen physics content. The Commonwealth is partnering with the Palo Alto, California–based non–profit, CK–12 on this initiative as they will provide the free, open–source technology platform to facilitate the publication of the newly developed content as a "flexbook" — defined simply as an adaptive, web–based set of instructional materials.

"We need transformational ideas to ensure all Virginians are educated to compete in an increasingly competitive global economy," said Secretary Chopra. "This pilot initiative is a step in the right direction to introduce our students to contemporary physics topics and lab materials at no additional cost to the taxpayers or students," added Secretary Morris.

There is certainly a place for the investment-intensive educational publishing programs that only a firm with the resources of Cengage or Pearson or McGraw-Hill can provide. But there's also enormous opportunity to try new models that take advantage of the kind of collaboration that underpins all of academia to develop and distribute quality learning material for students at lower costs. (BTW, ck12 is hiring.)

Video: Android meets Eink

Keeping with the "labs" theme for recent posts, via a tweet from George Walkley:

Lots of talk about devices at TOC - now just saw this, Android + e-ink http://vimeo.com/3162590 #toc

The guys at MOTO labs have hacked together a prototype showing Google's Android operating system running on an e-ink display:


Android Meets E Ink from MOTO Development Group on Vimeo.

The "O'Reilly Bump" and Bookworm

During his TOC Keynote, Tim O'Reilly talked about how the status he confers through "retweets" on Twitter are really just another form of publishing, not much different from the status we confer on authors by publishing them, or speakers by featuring them (especially at multiple conferences), or hackers by inviting them to Foo Camp.

On the Web, the effects are easily measured, and Liza Daly has a post over at O'Reilly Labs talking about the bump Bookworm got from the association with O'Reilly. Her graph tells the main story, but digging deeper reveals some notable nuggets (emphasis in the original):

Because of this integration [with Stanza], iPhone and iPod Touch users account for 10-20% of all visitors to Bookworm on any given day

Photos from New York Times R&D Lab

Nick Bilton was a hit yesterday at the TOC Conference, and during his keynote he talked about what they're working on with content at the NYT R&D Lab. Nick was kind enough to give a few of us a private tour earlier this week, and here's some photos from the trip:

IMG_0277.JPG

IMG_0278.JPG

IMG_0280.JPG

IMG_0282.JPG

IMG_0283.JPG

Open XML API for O'Reilly Metadata

In addition to Bookworm, O'Reilly Labs now includes an RDF-based API into all of O'Reilly's books:

Most publishers are familiar with the ONIX standard for exchanging metadata about books among trading partners. Anyone who's actually spent time working with ONIX knows that its syntax is abstruse at best. While ONIX does use XML, there are more modern, more general, and more immediately comprehensible standards out there, particularly for the basic details like "author," "title," and "edition." One of those standards is RDF, or "Resource Description Framework." This experimental O'Reilly Product Metadata Interface (OPMI) exposes RDF for all of O'Reilly's titles, organized by ISBN.

If anyone onsite (or otherwise) puts anything interesting together with the data, we'll be happy to feature it here on the TOC Blog, just let us know in the comments.

At TOC: Cory Doctorow to Publishers: Demand Option To *Not* Use DRM

I knew Cory Doctorow would be a great wrap up to the first day morning keynotes at TOC, and he more than delivered.

He ended the keynote with a challenge to publishers: withhold digital content from any device or service that doesn't give you the option to exclude DRM. (For example, right now publishers cannot sell books on the Kindle or audio books on Audible without DRM.) He's proposing "Doctorow's Law" which I'm paraphrasing here from memory:

If someone takes something that belongs to you, and puts a lock on it that you don't have a key for, that lock isn't in your best interests.

We couldn't agree more, and it's a big reason we sell all of our ebooks (now more than 400) without DRM (and with a Kindle-compatible format that can be added manually to a Kindle), and why we don't enable DRM in our iPhone Apps either. I agree with Cory, and strongly encourage publishers to not use DRM at all for their digital content, but at a minimum, it should at least be a choice for a publisher to make.

Good Company Culture Comes in Small Packages

Common wisdom says that small companies are more nimble, responsive and adaptable than their larger cousins.

My personal experience reflects this. I've worked in large organisations -- FMCG corporates, international aid organisations and government -- and I've worked in small ones -- private consulting firms and small non-profits. In each case I've found that small enterprises outperform large ones when it comes to transformation. Smaller companies are faster to identify industry trends and respond to new business opportunities. They also punch above their weight on some forms of R&D, particularly business process innovation. Put simply, small companies are more fleet of foot.

But why?

We're seeing a lot of reports come through about how small publishers are responding to trends and opportunities. MediaBistro and The Christian Science Monitor have both reported small publishers are leading the charge when it comes to digitization. In his article, "E-book revolution favors the agile", Matthew Shaer said:

But it's not the bigger houses, such as Macmillan or HarperCollins, that are moving the fastest. Instead, some of the most extensive restructuring efforts are being undertaken in the independent publishing world, traditionally a hotbed for innovation and experimentation.

Soft Skull Press, Canongate, Akashic are all good examples. Shaer also points out that publishing is emulating the music industry in this pattern and, I'd wager, other industries as well.

Again, I ask why?

The obvious reasons are the ones people usually point to. Smaller companies are like the canary in the coal mine. They are first to feel the effects of major shifts within an industry and may need to move faster to find solutions. On the other hand, small publishers also have an incentive to exploit technological efficiencies that might even up the playing field against big competitors.

Small size also helps with changing direction. This week Wheatland Press announced it is taking a publishing hiatus in 2009:

What this means is that I will publish no new books during 2009 (including Polyphony 7). I will continue to fill orders on existing titles and will keep those titles available through Amazon and Barnes & Noble.com ... I will explore ways to put Wheatland Press on a firmer financial footing including, but not limited to, seeking external funding via arts councils, seeking partnerships with other presses, etc. I hope the break will allow me to return to a regular publishing schedule in 2010.

On one level this could be regarded as just another volley of bad news from a publisher affected by global economic conditions. But it's worth noting that only a small publisher could make this kind of decision. HarperCollins and Random House can't make the choice to stop publishing books for a year to sort out their business model and make necessary changes. They can cut costs through staff layoffs and tightening budgets, but their operational overheads are way too large to ever get off the treadmill of publishing hundreds of titles a year.

Underneath it all, though, the one thing that has the biggest impact on a company's ability to transform is the one thing that almost never gets talked about in the publishing industry: organizational culture. Paul Biba of TeleRead, quoted in the Shaer article, hints at this but doesn't quite nail it down:

"In general, I'd say the big publishers tend to be really dinosaurs, intrigued by e-books but afraid of them ... [Younger readers] have grown up with a whole different way of looking at the world, and I don't think many publishers understand this. They think people are just sitting down in leather chairs and reading hardcopy books."

I'm not sure this is a fair characterization of publisher attitudes today, but I do think it alludes to a bigger problem that is stopping large publishers from embracing new opportunities.

Big trade publishers are fighting a losing battle against their own organizational cultures. The history of business is littered with examples of companies that couldn't transition from one paradigm to the next, not because they couldn't see the necessity, but because they couldn't undertake the necessary internal change.

The larger a company is, the harder organisational change is to effect. The big trade publishers are now subsidiaries of the largest media companies in the world with thousands of employees, hundreds of offices and decades of crusted-on beliefs, traditions and systems. Small teams, by virtue of scale, can change their organisational culture quickly, sometimes through shifts in personnel, other times by the sheer force of personality from a charismatic leader. In any case, smaller teams tend to adopt a tenacious, can-do, try-anything culture because they have to.

Organisational culture is the bedrock of performance. This, more than any problem of physical infrastructure or technical or financial systems, makes big publishers slow to adapt. Too slow, I fear, to survive the speed of change within the cultural and economic ecology of which they are a part.

New experiments are popping up, such as HarperStudio, which could be the exception that proves the rule. Only by hiving itself off as a separate, entrepreneurial unit within HarperCollins, with its own small-team culture, has HarperStudio been able to achieve the clear-eyed perspective and momentum to try really different and new ways of publishing.

Paul Biba may have called it right by using the word "dinosaur." After all, it was the small dinosaurs, with modern-day descendants still thriving, who made the successful adaptation that evolution requires. The big guys fell hard and fast and it's increasingly rare to find any evidence of their impact on us at all.

StartWithXML Research Report Now Available for Sale

If you weren't able to attend the StartWithXML Forum last month in New York, the accompanying research report is available for sale. The report covers topics like:

  1. Where am I and where do I want to end up?
  2. How much benefit do I want to obtain from content reuse and repurposing?
  3. How much work do I want to do myself?
  4. How much time and money will this take?
StartWithXML: Making the Case for Applying XML to a Publishing Workflow

When you purchase the report, you get it as our full eBook Bundle, including PDF, EPUB, and Kindle-compatible Mobipocket formats.

If you're ready for a deeper dive into XML, there are two very complementary tutorials lined up during next week's TOC Conference:

And if that's still not enough angle brackets for you, check out the Introduction to XML course from the O'Reilly School of Technology, which earns you four CEUs (Continuing Education Units) and a CEU letter from the University of Illinois Office of Continuing Education. Save $50 with discount code SWXML09.

Stay Connected
RSS TOC RSS Feeds
 News Posts
 Commentary Posts
 Combined Feed
 New to RSS?
Newsletter Subscribe to the TOC newsletter.
Tarsier Icon Follow TOC on Twitter.
Newsletter Join the TOC Facebook group.
Newsletter Join the TOC LinkedIn group.
TOC Widget Get the TOC Headline Widget.
Search
TOC In-Depth

Impact of P2P and Free Distribution on Book Sales Impact of P2P and Free Distribution on Book Sales

This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.


StartWithXML: Making the Case for Applying XML to a Publishing Workflow StartWithXML Research Report

The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.


Tools of Change for Publishing tutorial DVDs TOC 2008 Tutorial DVDs

Dive into the skills and tools critical to the future of publishing. Learn more.

Tag Cloud
TOC Community Topics