Note: All values were collected from the games played on or before 4/14/14
It is early in the season and we all should know not to take too much stock in a couple of weeks worth of stats. The time though is coming when a few key stats can give us an idea of what to expect for the rest of the season. One stat which people may not know they can project to an extent is batting average. Chase Utley will probably not be able to continue hitting .489, but how much should we expect his average to drop? By limiting batting average for balls in play (BABIP) to historic values and regressing strikeout percentage, we can get a good idea where a player's batting average may end up.
The two main components to determine a player's batting average are strikeouts and BABIP. By looking at each stat, we can combine the information to get an idea on predicting a batting average.
Strikeout Rate (K%) is one of the first stats to begin to stabilize for hitters each season. A couple of well known studies (the first from Pizza Cutter and other from Derek Carty) have been done on when a hitter's stats stabilize.
While both studies don't have the same value for when strikeout rate begins to stabilize (60 PA for Pizza Cutter's and 100 PA for Derek Carty's), they both agree it is the first major hitting stat to stabilize. For my work I will use the 100 PA value because it is the most conservative and is based off of the most recent data. While players are only about half way to the 100 PA value (51 PA for Chase Utley), strikeout rate differences can be seen. Also, it is better to have the tools and understanding in place before the stabilization is actually reached.
Just because a stat is at the stabilization point, it doesn't mean the player is at their true talent level. The 100 PA value means that a person can use half of the player's stats and half of the league's stats to get their talent level. For example Chase Utley has a 51 PA and a 7.8% K% while the league rate is at 21.2%
The formula for determining K% talent level is (using Utley as an example):
K% = (Utley's K% * Utley's PA + League K% * PA to Stabilize)/(Utley's PA + PA to Stabilize)
K% = (7.8% * 51PA + 21.2% * 100 PA) / (51 PA + 100 PA)
K% = 16.7%
It may not seem right for his strikeout rate to more than double, but the K% is closer to his career 15% K%. If Utley is able to maintain the extremely low rate longer into the season, the regressed K% will drop more and more. For now we have to assume it will go to the league average value.
The next key stat to look at when it comes to projecting early season batting average is BABIP. The stabilization point for BABIP at 1,126 at bats, which I am going to guess is not going to happen this season.
While we can't get a good idea where a player's BABIP will end up at, we do have a good idea of where it won't be. Right now, Utley has a .500 BABIP. His previous high BABIP was .362 and that value was back in 2007 when he still had some speed left. The .500 value is a far cry from his career .307 value. Among the 10,706 400 PA seasons between 1950 and 2013 (link), only seven players had a BABIP over 400 and the max value was by Reggie Jefferson in 1996 at .408.
Here is a chart of the number of qualified hitters who had over (or under) a certain extreme BABIP value.
Everyday hitters can expect to have their BABIP value between .230 and .380 while only ~10% of all qualified hitter are outside the .250 to .360 range. Right now this season, over 50% of the hitters are outside the .250 to .360 range.
Since players do have some influence over their BABIP, mainly because of speed and how hard they can hit the ball, all players shouldn't be regressed to the league average. Instead, when doing early-season projections, just set a maximum and minimum BABIP value. I like to use the .250 and .360 values, but I will include other values for reference.
If you have actually read everything and not jumped ahead to the table, you will need to wait for a couple more housekeeping bits. For the batting average formula, I used the two methods listed above to deal with K% and BABIP. The problem is that batting average takes a few other components into account.
1. BABIP only concerns itself with balls in play, not home runs. I am just going to count the home runs as hits and not regress them.
2. To account for sacrifice hits, I made a small adjustment to take them into account.
3. For walks, which will become significant later in the season (168 PA), I used the same regression method as strikeouts.
So finally, here are the projected batting averages which take into account early season strikeout rates and maximum and minimum BABIP values (ordered by current batting average). In an appendix at the end of the article, I go through a simple procedure for re-running the data anytime you wish on your own.
Here are some notable players who will likely see their batting adjusted to new levels and why.
Everth Cabrera (.340 AVG, .500 BABIP, 31% K%) - The mixture is right for a Cabrera batting average implosion. A K% (21% career average) which even regressed is still at 25%. The .500 BABIP is completely unsustainable especially since his previous season high BABIP was .337. The numbers point to a batting average around .247 which is one point higher than his 2012 average. It might be tough to move him now, but as his AVG drops, maybe when his AVG is around .300 try to sell. It is close enough to his 2013 AVG that there may be an owner thinking that he is buying Cabrera at value.
Khris Davis (.279 AVG, .414 BABIP, 33% K%) - The high BABIP is propping up his .279 AVG right now while the high strikeout right continues to drive it down. The main issue with Davis' 2014 season compared to last season is his approach to pitches out of the strike zone. In 2013, he swung at pitches out of the zone 30% of the time. In 2014, the value is at 41%. When he does swing at those pitches, he makes little contact. His contract rate on pitches outside of the strike zone has gone from 57% to 40%. Until he quits swinging-and-missing pitches out of the strike zone, he will continue to struggle.
Pedro Alvarez (.167 AVG, .097 BABIP, 21% K%) - Alvarez is taking a new more patient, but home run happy approach at the plate. His walk rate has nearly doubled from 8% to 14%. His strikeouts are way down (30% to 21%). With the change, his on-base percentage (OBP) is nearly the same (.286) as last year's (.296). The issue is he seems to be only looking for home runs in which he is one behind the league leader (Trumbo) with five. The lack of hits, only three besides the home runs, should eventually come up. If he is able to get the BABIP into the .250 range, he could be looking at a near .300 AVG with the drop in strikeouts.
Jhonny Peralta (.150 AVG, .103 BABIP, 18% K%) - To put it simply, the BABIP Fairy doesn't like Mr. Peralta. He is basically doing everything right. Walking more, striking out less. He's hitting for more power. His line drive rate is down, but not non-existent. Just getting his BABIP into the .250 range raises his AVG to .271. He could see even more of a jump since he has a career .313 BABIP. His approach seems fine for now, buy low.
While it takes a while for batting average to stabilize, a couple of its components, strikeout and BABIP, can be used to get an idea of a maximum or minimum batting average. By getting this projected value, good buys and sells can easily be identified. As the season continues, other information will become more significant and we will take that into account. For now, take advantage of some inflated or deflated batting averages.
Appendix – How to re-run the analysis when more data is available as the season goes on.
1. Download this spreadsheet.
2. Go to this Fangraphs page and copy and paste the data from the page into the spreadsheet. Otherwise download the player spreadsheet (link is above the players on the right). Open the spreadsheet and then copy the player information in the large yellow highlighted block of data. Once the data is added, the projected values will update automatically.
3. The league wide K% (J6) and BB% (K6)can be found on this page.
4. The output in rows J to Q may need to be expanded depending on the number of players copied into the sheet.