Collette Calls: The Importance of Correlation

Collette Calls: The Importance of Correlation

This article is part of our Collette Calls series.

"Facts are stubborn, but statistics are more pliable." – Mark Twain

"There are three kinds of lies: lies, damned lies, and statistics." - Benjamin Disraeli

"I can prove anything by statistics except the truth." - George Canning

Statistics are both fun and dangerous at the same time. They are fun because they can help you prove or disprove what your mind believes or what your eyes see. They are dangerous because pretty much any statistic can be twisted to fit a particular point of view.

Earlier this week, John Dewan had a brief post over on his blog at ACTASports that covered the topic of throwing strikes. The general line of thought is that pitchers who throw more strikes will have a better ERA than their peers who throw a lower percentage of pitches in the strike zone. The post went on to explain that there were 81 pitchers that qualified for the ERA title in 2013 (min 162 IP) and he created four tiers of pitchers. Three groups had 20 while the final group had the 81st pitcher. These were the percentage of pitches each group threw in the strike zone (Zone%) and the ERA of each group:

GroupZone%ERA
Group A47.60%3.27
Group B44.80%3.52
Group C43.40%3.70
Group D40.80%3.91

Based on that data, it would seem that it would be easy to identify pitchers with strong ERA's if we simply targeted the pitchers that threw a high percentage of pitches in

"Facts are stubborn, but statistics are more pliable." – Mark Twain

"There are three kinds of lies: lies, damned lies, and statistics." - Benjamin Disraeli

"I can prove anything by statistics except the truth." - George Canning

Statistics are both fun and dangerous at the same time. They are fun because they can help you prove or disprove what your mind believes or what your eyes see. They are dangerous because pretty much any statistic can be twisted to fit a particular point of view.

Earlier this week, John Dewan had a brief post over on his blog at ACTASports that covered the topic of throwing strikes. The general line of thought is that pitchers who throw more strikes will have a better ERA than their peers who throw a lower percentage of pitches in the strike zone. The post went on to explain that there were 81 pitchers that qualified for the ERA title in 2013 (min 162 IP) and he created four tiers of pitchers. Three groups had 20 while the final group had the 81st pitcher. These were the percentage of pitches each group threw in the strike zone (Zone%) and the ERA of each group:

GroupZone%ERA
Group A47.60%3.27
Group B44.80%3.52
Group C43.40%3.70
Group D40.80%3.91

Based on that data, it would seem that it would be easy to identify pitchers with strong ERA's if we simply targeted the pitchers that threw a high percentage of pitches in the strike zone. If it were only that easy.

Utilizing the pitching correlation tool recently published at Fangraphs, we find that the correlation between ERA and Zone% is overall, very low. The tool spits out that the base correlation for the two stats is -0.048 over 709 examples of pitchers working at least 100 innings in a season from 2009 through 2013. In case you do not know, that's a very weak correlation.

If we break the results down to all qualified starting pitchers as well as all relievers that threw at least 50 innings in a given season, the r2 results are not any better.

SeasonStarters IP=>162Relievers IP >=50
20090.0740.004
20100.0160.005
20110.0380.002
20120.0000.001
20130.1570.000

Note that the year to year correlations are not consistent. That said, it is worth noting that the correlation in 2013 for starting pitchers was the strongest it has been over the past five seasons and double what it was in 2009.

There are several examples as to why the correlation is so weak are Clayton Kershaw and CC Sabathia. Kershaw's Zone% in 2013 was 52.3% and he had the lowest ERA in all of baseball at 1.83. Sabathia's Zone% was slightly higher at 52.7%, but his ERA was much higher at 4.78. Zack Greinke had a 46.4 Zone% yet posted a 2.63 ERA while Jeremy Hellickson's 48.1 Zone% paired with a 5.24 ERA.

While throwing strikes is important, getting strikeouts is more important if you are looking for pitchers with lower ERA. Last season, the correlation between ERA and strikeout rate (K%) was 0.253. It is a small positive correlation, but it is still stronger than the correlation between Zone% and ERA. The same tool referenced earlier from Fangraphs gives us a base correlation of -0.479, meaning the higher the K%, the lower the ERA reviewing a sample size of 709 pitchers that worked at least 100 innings in a season over the past five seasons.

After all, strikeouts are one of the strongest stats in terms of year-to-year correlation. If you want to pick a stat to evaluate pitchers as you plan your 2014 draft lists, this is how the different metrics correlated from year to year in 2002-2012 as calculated by Matt Klaassen of Fangraphs.

Keep that in mind as you read or listen to people who are optimistic or pessimistic about a pitcher based on 2013 ERA's, BABIP's, or LOB% rates. Stay away from analysis based on volatile statistics and look at pitchers based on statistics that show strong year to year correlation. No statistic is perfect, and any statistic can be twisted, but some metrics are much more flawed than others.

Want to Read More?
Subscribe to RotoWire to see the full article.

We reserve some of our best content for our paid subscribers. Plus, if you choose to subscribe you can discuss this article with the author and the rest of the RotoWire community.

Get Instant Access To This Article Get Access To This Article
RotoWire Community
Join Our Subscriber-Only MLB Chat
Chat with our writers and other RotoWire MLB fans for all the pre-game info and in-game banter.
Join The Discussion
ABOUT THE AUTHOR
Jason Collette
Jason has been helping fantasy owners since 1999, and here at Rotowire since 2011. You can hear Jason weekly on many of the Sirius/XM Fantasy channel offerings throughout the season as well as on the Sleeper and the Bust podcast every Sunday. A ten-time FSWA finalist, Jason won the FSWA's Fantasy Baseball Writer of the Year award in 2013 and the Baseball Series of the Year award in 2018 for Collette Calls,and was the 2023 AL LABR champion. Jason manages his social media presence at https://linktr.ee/jasoncollette
Week 4 FAAB Results - Some Hitters Emerge
Week 4 FAAB Results - Some Hitters Emerge
San Diego Padres at Colorado Rockies, MLB Expert Picks for Monday, April 21
San Diego Padres at Colorado Rockies, MLB Expert Picks for Monday, April 21
Fantasy Baseball Injury Report: Kelly's Recovery Window Uncertain
Fantasy Baseball Injury Report: Kelly's Recovery Window Uncertain
Mets-Giants, Marlins-Braves & Brewers-Pirates, MLB Expert Picks for Monday, April 22
Mets-Giants, Marlins-Braves & Brewers-Pirates, MLB Expert Picks for Monday, April 22