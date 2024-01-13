This article is part of our The Z Files series.

Last time, I posed the question, " How Much Influence Does a Pitcher Exert? " Statcast's average exit velocity and HardHit% were reviewed, along with Baseball Info

Thinking about it from a practical sense, what should affect BABIP? That is, what factors can a pitcher influence that would generate a low BABIP? The two that immediately come to mind are limiting hard contact and line drives.

Over the years, it's become apparent that pitchers exert some influence on their BABIP; it isn't totally random. That said, the level of influence is often overblown which can produce flawed analysis. However, expecting everyone's BABIP to move towards league average is also egregious.

The original DIPS (defense-independent pitching statistics) theory noted how a large percentage of pitcher's BABIP clustered around .300. If an individual's mark was lower, he was deemed lucky, and his BABIP was expected to regress towards .300, either in-season or the following year. If it was above .300, the hurler was judged to be unlucky, with impending regression down to .300.

Thankfully, we've come a long way since the advent of DIPS theory. To be fair, Voros McCracken's findings were revolutionary, and served as a foundation for a new generation of analysis.

I'm so old, I remember when we assumed every pitcher's BABIP (batting on balls in play) would regress to .300. It didn't matter if it was Pedro Martinez or Pedro Astacio, their BABIP should be .300.

The formula for BABIP is:

(Hits – HR)/(AB – HR- K +SF)

Last time, I posed the question, "How Much Influence Does a Pitcher Exert?" Statcast's average exit velocity and HardHit% were reviewed, along with Baseball Info Solutions Hard%, Medium% and Soft%. The answer to that question is, "Not as much as you think." It is more than random, but there are other elements of pitching for which a pitcher exerts more control.

The chief area which a player can affect is groundballs and flyballs. The influence of line drives is the weakest of everything cited in this discussion.

Putting everything together, pitchers have limited influence over hard and soft contact, as well as the number of line drives they surrender. In other words, pitchers don't exert much control over two of the factors helping to maintain a low BABIP. Or at least not as much as many intuit.

On the other hand, pitchers have a great deal of influence over groundballs and flyballs. Furthermore, it is known the BABIP of grounders is higher than that of flies. As such, groundball pitchers should organically carry a higher BABIP than their flyball counterparts. Maybe we shouldn't be so quick to assume regression to the league mean.

Before going on, I am not taking credit for what follows. It was derived from independent thinking, but I am sure others do the same, or something similar. The notion is a pitcher's xBABIP (expected BABIP) can be computed solely based on their batted ball distribution.

xBABIP = (GB% x GB BABIP) + (LD% x LD BABIP) + (FB% x FB BABIP).

Using 2023 league averages, the component BABIP are

Line Drive: 0.628

Groundball: 0.248

Flyball: 0.095

These numbers may be different from those cited elsewhere because these don't include homers. The overall BABIP for flyballs and line drives is higher. For simplicity's sake, bunts are included with grounders while popups are lumped with flyballs. If this were a study to be submitted to a SABR conference, I may have further distilled the components.

Let's plug in some numbers to get a feel for the range of BABIP, based solely on batted ball distribution. The LD% will be kept constant. Last season, it was 25.2 percent. Again, this is just for balls in play. It is higher than what will be shown elsewhere for the league average, but that denominator includes homers (mostly all flyballs, with some line drives, depending on the data source). The top line in bold blue is the league average, encompassing all pitchers who threw at least 50 innings.

GB% FB% LD% BABIP 44.3 30.4 25.2 0.297 65 9.8 25.2 0.329 60 14.8 25.2 0.321 55 19.8 25.2 0.314 50 24.8 25.2 0.306 40 34.8 25.2 0.291 35 39.8 25.2 0.283 30 44.8 25.2 0.275 25 49.8 25.2 0.268

After the league average, the top two and bottom two are extremes. Most pitchers induce between 35 and 55 percent ground balls, generating an xBABIP range between .283 and .314. I can't count the number of times I saw a .283 BABIP and Pavlovian assumed it was lucky, or targeted a .314 BABIP, convinced it would drop.

Here is a sortable table displaying the BABIP and xBABIP of all pitchers who compiled at least 50 frames last season.

This is a lot of data to digest, but keep in mind it's backwards looking, or as the kids say, it's descriptive, not predictive. Not to mention, the assumption is the individual has no control over the authority of contact, or the number of line drives. It was never stated the pitcher has zero control, only that it's limited, at least compared to how able they are to induce grounders and flies.

The next step is incorporating xBABIP into formulaic projections. This obviously entails projecting batted ball distribution. Data presented last time (and linked above) illustrates that groundball and flyball rates correlate well from year to year. One approach could be to use a weighted average for groundball and flyball rate, similar to that utilized for other statistics. The line drive rate can be what's remaining after the others are computed.

This also requires projecting the component BABIP. Using the previous season's levels could suffice, as could a weighted average of a few previous seasons. For the purpose of projections, the precision isn't important since everyone gets the same treatment, so when they are compared on a relative basis, if everyone is a little too high, or a little too low, the rankings for drafting will be the same.

Some may find the above flawed, since pitchers exhibit little control over line drives. Perhaps the line drive rate should be the projected league average for everyone, then the ground ball and fly ball rates are derived from a projected GB/FB ratio.

An argument can be tendered for either method, especially when considered in light of the following: pitchers do exert some measure of influence. Determining how much is going to be subjective, just as it is with other statistics. A good projection system should allow for overriding, especially if it's a level of regression. As a simple example, if a pitcher yields 20 homers, but his xHR is 24, plugging anything between 20 and 24 into the little black box can be justified. If the prognosticator has a reason the pitcher should have surrendered 24 homers, plugging 24 in is defensible. Personally, I set regressions of this nature to 50 percent, then season to taste.

A more complex version of xBABIP can regress everything, including the pitcher's component BABIP, to league norm. This helps account for the pitcher's ability to influence their own numbers. In the case of xBABIP, my starting point will probably be a stronger regression to league mean for all of the involved factors.

The objective of this presentation is not to spur everyone to fire up Excel and crunch xBABIP for 750 pitchers expect to appear in MLB this season. By all means, if you so desire, go for it. The primary goal is to illustrate BABIP has a wider acceptable range than many assume. You don't need to set up formulas. After checking out a pitcher's BABIP, look at their batted ball distribution. Maybe you shouldn't hang onto a perceived unlucky guy or fade an assumed lucky one. Just don't give the pitcher too much credit (or blame) for the delta from expected.