Tuesday, April 14, 2015

Baseball is More Than Data

As baseball starts its second week, lets reflect a bit on how data analytics has changed the game. Not just the Moneyball phenomenon of ranking players but also the extensive use of defensive shifts (repositioning the infielders and outfielders for each batter) and other maneuvers. We're not quite to the point that technology can replace managers and umpires but give it another decade or two.

We've seen a huge increase in data analysis in sports. ESPN ranked teams based on their use of analytics and it correlates well with how those teams are faring. Eventually everyone will use the same learning algorithms and games will just be a random coin toss with coins weighted by how much each team can spend.

Steve Kettmann wrote an NYT op-ed piece Don't Let StatisticsRuin Baseball. At first I thought this was just another luddite who will be left behind but he makes a salient point. We don’t go to baseball to watch the stats. We go to see people play. We enjoy the suspense of every pitch, the one-on-one battle between pitcher and batter and the great defensive moves. Maybe statistics can tell which players a team should acquire and where the fielders should stand but it still is people that play the game.

Kettmann worries about the obsession of baseball writers with statistics. Those who write based on stats can be replaced by machines. Baseball is a great game to listen on the radio for the best broadcasters don't talk about the numbers, they talk about the people. Otherwise you might as well listen to competitive tic-tac-toe.


  1. Gunnar Andersson1:46 AM, April 15, 2015

    XKCD got it right on sports and randomness: https://xkcd.com/904/

  2. Nate silver responds to the OPED and includes
    Bonus Podcast: Nate Silver Talks with Steve Kettmann


  3. Is science more than (grant income) data ?


  4. Most of the concern in the op-ed seems to be on statistics ruining sportswriting rather than ruining baseball itself. Much of sportswriting is very straightforward - and, independent of statistics, it isn't hard to do as good a job as basic sportswriters - maybe that is why sportswriting is where many reporters begin.

    This aspect can be largely automatic: See

    The much less emphasized concern in the op-ed is fantasy leagues dominating interest in the real games themselves and here I think that there is a point. I am surprised that you did not mention them and the op-ed piece only mentioned them in passing. Fantasy leagues are major money-makers for the leagues so they aren't going away. They may be somewhat representative in baseball but in the NFL they are so far from what is actually important for the games that they really are a distant sideshow.

    Maybe the real sportswriting concern is the amount of media attention devoted to reporting expressly for those following the fantasy leagues. This clearly can be a completely automated side show.

  5. While the column makes some fine points, I think some of the appeal of baseball (vs. other sports) is that many of these analytical elements are in the forefront: the game theory of pitching/batting, the physics of a curveball... and the statistics that can drive decisions from player recruitment to managerial decisions. This is why many people love the sport.

    Just as in any field, where there is data to be gathered, there are statistics. Many of these features and statistics are meaningless. The art is in interpretation, and here is where sportswriters can add value: not by ignoring the numbers, but in using them to tell a story, just as we do in research.

    Just as many CS researchers do data analysis and are unlikely to be replaced by machines, sportswriters can't be replaced by machines. The value they can provide is in meaningful interpretation of the statistics, as opposed to "mere data reporting". The writers shouldn't shy away from statistics; rather, they could figure out how to interpret them and separate signal from noise, just as we do in our research papers.