The RotoWire Blog has been retired.

These archives exist as a way for people to continue to view the content that had been posted on the blog over the years.

Articles will no longer be posted here, but you can view new fantasy articles from our writers on the main site.

Sample Size Isn't Everything

Few dispute the importance of sample size in informing our conclusions about a set of data. If an average player homers in three straight games, it tells us something about him, but the sample would typically be too small for us to conclude he had developed a new power skill. But if the player didn't just homer in three straight games, but had seven homers over that span, and the average distance of each were 450 feet, it would be foolish not to think something had probably changed about him.

So it's not only the size of the sample that's relevant but also its magnitude. To illustrate this, imagine you flipped a quarter 20 times and got 16 heads. It's a small sample of flips, and even though that's an unlikely result (0.6 percent), you probably wouldn't bet a ton of money the coin was weighted. After all, very few quarters are weighted, and a 1 in 160 long shot isn't that crazy. But if instead of 16, you got 20 heads, the odds would go down to less than 1 in 1,000,000 that it was just luck. Notice the sample size (20 flips) was exactly the same, but the magnitude of the results (all 20 heads) is what changed.

While most grasp this concept easily enough, caveats about sample size have been drilled into the fantasy baseball community for long enough many miss it in that context. For example, last week I floated two Twitter polls on this issue:


and 

Let's take the pitching question first. Nearly three quarters of the respondents preferred the 220-K pitcher to the 18-K one because of the much larger sample. But take a look at the list of pitchers since 1900 who have struck out 18 or more in a game:

PitcherDateStrikeoutsIPCareer Ks
Tom Cheney12-Sep-622116345
Kerry Wood6-May-982091,582
Roger Clemens18-Sep-962094,672
Roger Clemens29-Apr-862094,672
Randy Johnson8-May-012094,875
Randy Johnson8-Aug-971994,875
Randy Johnson24-Jun-971994,875
David Cone6-Oct-911992,668
Nolan Ryan12-Aug-741995,714
Tom Seaver22-Apr-701993,640
Steve Carlton15-Sep-691994,136
Luis Tiant3-Jul-6819102,416
Nolan Ryan14-Jun-7419135,714
Nolan Ryan20-Aug-7419115,714
Nolan Ryan8-Jun-7719105,714
Corey Kluber*13-May-15189630
Ben Sheets16-May-041891,325
Roger Clemens25-Aug-981894,672
Randy Johnson27-Sep-921894,875
Ramón Martínez4-Jun-901891,427
Bill Gullickson10-Sep-801891,279
Ron Guidry17-Jun-781891,778
Nolan Ryan10-Sep-761895,714
Don Wilson14-Jul-681891,283
Sandy Koufax24-Apr-621892,396
Sandy Koufax31-Aug-591892,396
Bob Feller2-Oct-381892,581
Warren Spahn14-Jun-5218152,583
Chris Short2-Oct-6518151,629
Jim Maloney14-Jun-6518111,605

If we remove pitchers from the 1950s and '60s who got to 18 strikeouts in extra innings (one needed 16 innings), 17 of the 25 times it happened Randy Johnson, Roger Clemens, Sandy Koufax, Steve Carlton, Tom Seaver, Nolan Ryan or Bob Feller did it. All are in the Hall of Fame. Among the other other pitchers to do it were Ron Guidry during his 1.74 ERA, Cy Young season, Corey Kluber (245 Ks) last year, Ben Sheets during his 264:32 K:BB season, Luis Tiant during his 264-K, 1.60 ERA year, David Cone during a 241-K season, Ramon Martinez in a 223-K, 20-win, No 2 in the Cy Young award season, Bill Gullickson and Don Wilson. Bottom line, knowing nothing else, you should much prefer the 18-K pitcher over the 220-K one.

For the hitting question, it's a little different because the sample (4-6 AB) is even smaller, so I put it up against a more modest 25 HR. Again, the respondents voted overwhelmingly in favor of the larger sample. Let's take a look at players since 1900 who have hit four HR in one game:

PlayerDateScoreCareer HR
Lou Gehrig3-Jun-3220–13493
Chuck Klein10-Jul-369–6300
Pat Seerey18-Jul-4812–1186
Gil Hodges31-Aug-5019–3370
Joe Adcock31-Jul-5415–7336
Rocky Colavito10-Jun-5911–8374
Willie Mays30-Apr-6114–4660
Mike Schmidt17-Apr-7618–16548
Bob Horner6-Jul-868–11218
Mark Whiten7-Sep-9315–2105
Mike Cameron2-May-0215–4278
Shawn Green23-May-0216–3328
Carlos Delgado25-Sep-0310–8473
Josh Hamilton8-May-1210–3195

While this list isn't as impressive as that of the 18-K pitchers, you can see it's still better than your average 25-home run player. Lou Gehrig, Willie Mays and Mike Schmidt are inner-circle Hall of Famers, Chuck Klein is also in the Hall, and peak Carlos Delgado and Josh Hamilton were MVP-level players. Bottom line, it's better to take your chances with a random guy who hit four homers in a game rather than one who hit 25 homers in a year.

So be wary when some tries to end the discussion with "small sample," without looking at its magnitude. (Be even more wary when someone says "small sample size." Sample size is an issue, but only the sample itself can be small or large. Saying the "sample size" is small, is like saying a person's height is tall. It's a misuse of language.)