• Print

Text and XML of All #TOC 2009 Tweets

I was planning to do some crunching last night and early today, but between an unexpected flight delay coming back from New York, and the pleasant surprise of getting Slashdotted about Bookworm, the day is quickly slipping away. I’ll give it a go over the weekend, but if anyone else is eager to play, here’s a super-raw text dump (the best I could do for getting around the API limit). Update: to be explicit, this covers roughly mid-afternoon Sunday 2/8 through late morning Thursday 2/12, so includes the entire event, but not every #toc tweet.

Update #2: Using the raw text as a starting point, I’ve generated an XML file listing all of the people who tweeted with hashtag #toc during the conference, and listed each of their tweets. I’ll leave it as an exercise to the reader 🙂 to sort by time, or otherwise slice/dice (best visualization among those submitted in the comments by 2/24 at midnight EST gets a free pass to TOC 2010 — winner chosen by the TOC program committee, and announced 2/26).

Update #3: Unfortunately, the Twitter Search API appears to only have returned the first ~15 or so of each user’s #toc tweets (nowhere near enough to include all of the 200+ tweets from the top tweeter, @thewritermama, so that XML doesn’t contain all of the tweets in the plain text. I’ve posted the intermediate XML I used, which contains less data about each tweet and tweeter, but does contain all of the tweets.

Update #4: Anyone interested in the gory details of where the XML came from, I’ve posted some background over at O’Reilly Labs.

tags: ,

Comments: 4

  1. I can’t resist data, of course.

    (For a prettier display I removed all profile images that were broken — I’m guessing they were new accounts that have since been processed — and also any default images.)

  2. I counted 39 tweets on my personal history. Your intermediate XML file only lists 7.

    Could you share your list of Twitterers?


    P.S. I’ll happily send you my tweets if that helps.
    P.P.S. Nice visualization, Liza.

  3. @Geoff — Yeah, after poking around a bit I realized that the Search API didn’t return all of a user’s #toc tweets. I’m not familiar enough with the API to know what’s going on, but I did post an xml-ization of the raw text that should contain all of your tweets.

  4. Here’s a simple site (http://www.toctweets.com) that takes all the tweets during the conference, breaks them down by time period for easier browsing, and displays lots of Wordles and some stats.

    @Andrew, the (apparently) undocumented Search API parameter max_id lets you go back about four months to get around the built-in API limits. There were 7,000 tweets using this method until the Tour of California (#toc) started adding a lot of noise to the data. (Raw SQL.)

    It was a great conference, BTW.