What Can Baseball Tell Us About Big Data?

By now, most people are aware of the growing impact of data. An article last year in the New York Times referred to the emergence of big data as “a revolution,” while an article entitled “Is Big Data the New Black Gold?” appeared in Wired in February. While large companies such as IBM claims they’re using data to build a smarter planet, most individuals are not using data but producing it through Google searches and participation on social media sites. However, data has enormous potential for helping companies and organizations analyze how productive and effective their employees and strategies are.

Perhaps the most publicized use of big data comes from sports. In 2011, the film Moneyball based on the book by Michael Lewis was released. MoneybaMoneyball_Posterll examined how the Oakland A’s used statistical analysis to successfully compete against teams with much higher salary caps. Essentially, the A’s used data in favor of traditional scouts to evaluate baseball players. Moneyball also helped publicize sabermetrics, the name given to data driven statistical analysis of baseball. While sabermetrics used to be relatively obscure, they are being increasingly used on popular sports shows such as SportsCenter. Though sabermetrics are only used in the sports realm, big data’s recent increase in popular notoriety could be a great opportunity to apply some of the concepts to other industries.

The Mendoza Line:

The Mendoza line refers to the threshold between mediocrity and incompetence with regards to a player’s batting average. The Mendoza line was named after 1970s shortstop Mario Mendoza who flirted with a .200 batting average throughout his career. A batting average of .200 marks the Mendoza line, as players that hit over .200 are considered mediocre while players who hit under .200 usually don’t make it in the Major Leagues. More practically, the concept of the Mendoza line can be applied to almost anything as the divider between mediocre and bad. The term has been used by economists to describe worse than expected economic growth, as well as the decline of U.S.  10 Year Treasury Note yield under the 2%, which was seen as a Mendoza Line for Treasury Notes. The Mendoza line was also used in the TV show “How I Met Your Mother”as the dividing line on Barney Stinson’s Crazy/Hot scale.

Wins Above Replacement

trout cropped

Mike Trout, who had the highest WAR in baseball last year.

While the Mendoza line isn’t a sabermetric creation, Wins Above Replacement, or WAR is. As the name suggests, WAR calculates how valuable a player is compared to a replacement player (either a bench player or a minor league player). For simplicity, a replacement level player is assumed to be at or close to the Mendoza line. WAR is then expressed in wins that a player adds to their team. Like the Mendoza line, the concept behind WAR does not only apply to baseball. Essentially, WAR is a valuable statistic because it calculates how much a player adds in terms of wins to a baseball team, but that idea could also apply to employees. With the explosion of data, it may be possible to calculate how much value a great employee adds compared to a baseline or average replacement. A business version of WAR would be useful for determining who to promote or who to let go. Conversely, employees with higher WARs could also use the statistic to illustrate their value to the company when asking for a promotion or a raise.

Fielding Independent Pitching

Fielding Independent Pitching (FIP) is a statistic that tries to evaluate pitchers independently of the defenses behind them. FIP was created after research found that pitchers have little control over balls that are hit into play, so FIP only factors in outcomes a pitcher can control: walks, strikeouts, batters hit by pitches and home runs. As a result, FIP has been shown to be a better predictor of future performance than Earned Run Average (ERA), the most common pitching statistic. Again, while FIP itself might not be useful to businesses, the idea behind it is. Imagine your company does a lot of projects, and you want to try and figure out who’s the most productive member of a team. Using collected data and a statistic like FIP, it could be possible to assess individuals independently of the team they’re working with. Consequently, the statistic could show that some people in more capable groups appear to be better at their jobs than they actually are, while competent people in bad groups are less effective because of their group.

There is no doubt that there is tremendous potential for big data. As seen with baseball, data can lead to new insights and new ways of doing things. However, it is unlikely that most businesses have as much data about employees as baseball teams have about players. As a result, companies must find ways to compile data first before sabermetric concepts could be applied to that data.

We Also Recommend:

Our Google+ page