• Print

A Nate Silver book recommendation engine

The algorithm's role in discovery and serendipity

It’s NCAA tournament time here in the U.S. and plenty of bracketologists are turning to Nate Silver for his statistical expertise. Silver, of course, is known for his book, The Signal and the Noise, as well as predicting presidential elections and Major League Baseball player performance. I’m not aware of any statistical analysis he’s done in the book recommendation space but I know someone who has applied Silver’s thinking to help us figure out what book we should read next.

I’m talking about Stephanie Sun and a terrific article she wrote called Nate Silverizing Book Recommendations. I encourage you to read the entire piece, even if it’s been awhile since your last statistics class.

As you read Stephanie’s article, think about how book recommendation engines are likely to get better and better down the road. As she also points out, it’s not just about helping consumers discover their next great read. This same analysis can also be used to help editors prioritize their time when faced with a stack of manuscripts to review.

Many will cringe when told that this sort of curation and serendipity can be reduced to an algorithm. That’s not what anyone is suggesting though; the algorithm can simply be one of many tools to help improve discovery. And although it will never be perfect, look at how search engines have evolved since the early days of the web. Today we’re often limited to the very simplistic “people who bought X also bought Y” type of recommendation. We’re still in the early days of solving the discovery and recommendation problem in our industry and we need smart people like Stephanie Sun to drive improvement in our search and recommendation results.

tags: , , , ,
  • http://www.toc.oreilly.com/ Kat Meyer

    Hey Joe,

    I may have pointed you to this previously, but there’s an interesting article in American Way magazine (http://bit.ly/WNuwO3) about Ryan Kavanaugh of the film studio Relativity, and how they use a similar “moneyball”/algorithmic (Monte Carlo) in deciding what film projects to greenlight, or more precisely, which films NOT to greenlight. In his words: “I call it our movie-rejection system, not our movie-picking system. Just because the program says a movie is likely to make money doesn’t mean we’ll make that movie. But if it says that 100 percent of the time, this movie is going to lose money, we are not going to make that movie.”

    Flip to what can happen when humans are completely removed from or aren’t overseeing algorithms in creative process and results can be quite ugly, as Solid Bold Bomb tshirt company experienced recently: http://bit.ly/13uCJtl

    On the algorithms for discovery/curatorial side, Marco Ghezzi sent me a link to this Salon article, “The Curse of ‘You May Also Like’ – Algorithms and ‘big data’ are good at figuring out what we like—and that may kill creativity.” (http://slate.me/Z6wUMU)

    To me, it’s encouraging that the humans are not dead at least according to this NYT article, “Algorithms Get a Human Hand in Steering Web” ( http://nyti.ms/11kFHMYNYT). I’m glad people still have a place in the world of creation, curation and discovery – even at companies like Google and Twitter.

    • Stephanie Sun

      Hi Kat,

      Yes, for me it is all about control – not just any human control but my control as a reader to set the terms for filtering through books. I should be able to override an algorithm (or an in-house curator) at any time when I am browsing a book website.

      At the same time, there is the Daniel Kahneman school of thought regarding logical and intuitive thinking (which Silver also cites in his book). Human intuition is full of biases and imperfections in fields much more black and white than book publishing, and Bayes can really help parse out five, ten, one hundred book ratings with more finesse than a gut reaction or an average plus bar chart.

  • Stephanie Sun

    Also, for those who haven’t read The Signal and the Noise yet, Silver spends much of the book distinguishing between Bayesian (probabilistic) forecasting and what he calls “frequentist” forecasting, which is what many of us learned in our college statistics courses.

    Frequentist methods were popularized by an English statistician named Ronald Aylmer Fisher and involve creating samples of real world phenomena to then measure your data on. Silver makes a pretty compelling case that in many, many cases today Frequentist methods are little more than intellectual con artistry.

    If you think back on the times you have seen statistics or predictions that looked meaningless or suspect to you, it is likely that poorly applied Frequentist methods were to blame. This kind of algorithm-enabled focus group you see around (I think) falls under the Frequentist family, and kind of the opposite of what I tried to do.

    Not all statistics, data, analysis are created equal.