• Print

How Hackers Show it's Not All Bad News at the New York Times

News of a looming downgrade of NYT stock to "junk" status by Standard & Poor’s sadly isn’t all that shocking. I’m certainly glad I’m not an investor holding any NYT.

But there’s something going on at the Times that probably won’t make it to Silicon Alley Insider, much less the mainstream business press, and it’s something that’s starting to make me think the Times just might succeed in adapting to the changing rules of the media and publishing game (though there will almost certainly be many more casualties before it’s over).

So what’s the Times doing that’s so important? They’re hacking.

Not hacking in the nefarious sense, but in the original sense of experimentation, and curiosity, and solving interesting problems (as Paul Graham put it, "Great hackers think of it as something they do for fun, and which they’re delighted to find people will pay them for.") How many other publishers are running blogs about their work with open source software? Even fewer are developing and releasing their own high-quality open source software:

Quite frankly, we wanted to scale the front-end webservers and backend database servers separately without having to coordinate them. We also needed a way to flexibly reconfigure where our backend databases were located and which applications used them without resorting to tricks of DNS or other such "load-balancing" hacks. Plus, it just seemed really cool to have a JSON-speaking DB layer that all our scriptable content could talk to. Thus, the DBSlayer was born.

That is not typical newsroom conversation.

But this isn’t just about open source software, or even about some developers building cool software to run backend system. The Times has put developers right in the middle of the newsroom. At a MediaBistro event in May, Aron Pilhofer from the "Interactive News Technology" group at the Times (sharing the stage with their Editor of Digital News, Jim Roberts), talked about how the Minnesota bridge collapse was when they realized they needed to develop their own tools to cover the news with the web, and not just on the web. Less than a year later, when Hillary Clinton’s infamous public schedule was released, they had the people and the skills in place to crunch 12,000 PDF documents (containing images of scanned documents) through a text-recognition program, on to Amazon’s "Elastic Computing Cloud" and finally into a Ruby on Rails Web application providing full-text search across all eight years of calendars.

Just this week, the Times’ Derek Gottfrid gave a talk at O’Reilly’s Open Source Convention (OSCON) titled "Processing Large Data with Hadoop and EC2" based on work he’d done on the Times’ archives. Again, this is the kind of talk you’re not likely to hear at most newspapers (or magazines, or book publishers) these days:

I was able to create a Hadoop cluster on my local machine and wrap my code with the proper Hadoop semantics. After a bit more tweaking and bug fixing, I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. I logged in, started Hadoop and submitted a test job to generate a couple thousands articles — and to my surprise it just worked.

Earlier this month at FOO Camp I had the pleasure of meeting another hacker from the Times, Nick Bilton, part of the Times R&D lab — the folks who built the impressive NYT iPhone App.

UPDATE: Nick Bilton points out via email that:

There were people from nytimes.com that were instrumental in building the NYT iPhone app also … Is there anyway you can add a couple of words that the R&D Group ‘worked with nytimes.com’ to help build the iPhone app?

If you’re worried about EBITDA and EPS, then you’re rightly worried about the Times right now. But if you’re worried about the future of journalism, and about the ability of established media companies to adapt to a digital world, there’s also reason to be excited about the Times right now too.

tags: , , , , ,
  • David

    “Earlier this month at FOO Camp I had the pleasure of meeting another hacker from the Times, Nick Bilton, part of the Times R&D lab — the folks who built the impressive NYT iPhone App.”

    I would like to state that R&D did not develop the iPhone App. They may have had some input (which I doubt, but its possible), but they certainly did not build or design it.

  • http://toc.oreilly.com Andrew Savikas

    @David — that was a poor assumption on my part. Nick dropped me a line to clarify, and I’ve updated the post. My apologies to Nick and to the team at nytimes.com.

  • David

    I love Derek Gottfrid, he’s so hot… and tall too! His hadoop work is off the hook!