I'm the type of person that when I do something, I want to do it right. The best possible way. I'm not opposed to shortcuts, as long as the integrity of the thing isn't compromised. In sports and as a scientist, I ask a lot of questions and I don't take common knowledge for granted. Probing the status quo is what science is all about. One of the best lessons I learned in graduate school was that if I had a question about something in a class or seminar, other people probably had the same question. Or at least would be interested in knowing the answer. This attitude drives a lot of what I write about fantasy sports.
Now, I am admittedly a MLB DFS novice. This is my first year playing daily MLB every day. I've tried a lot of different lineup building techniques already, many of which I've written about here. When I started baseball, the first article I wrote here at RotoWire was about the use of past statistics to predict current performance. You can re-read it here
. Nobody really liked the sub-title "Buster Posey
mashes lefties and other lies in 2014" but all I was seeing on DFS sites was his really high price tag and no lefty mashing performances to validate it.
The big question is of course: What information best predicts which players are going to score the most fantasy points in any given night? When it comes to player talent, whether you like simple average, wOBA, or OPS (my choice), you have to choose the time frame you think is most relevant. Do you use this year's data? Last year's? The last three years'? Completely independent of park shifts, opponent, Vegas line, bobbleheads, or any other factor you consider, where do you start with talent?
I said I don't mind shortcuts, but the truth is I like them. Who doesn't? DFS is a lot of work. When I wrote that article, I really wanted to know whether past stats predicted current standings. My rationale was this: if a prior years worth of data didn't predict a player's current season performance, how could it predict a single night? Yes, there is a ton of variability night to night, and the #1 player vs LHP over the past three years is not going to be the #1 DFS scorer every night he faces a LHP. That's not what I'm asking or expecting. Simply whether a correlation exists between a guy's 2013 OPS and 2014 OPS.
The results were not pretty. Using just one month of batting data from 2014 (all data from FanGraphs), there was absolutely no correlation with 2013 batting data for the top hitters facing either LHP or RHP. Many of you commented that it was a ridiculously small sample size and asked to see it repeated later in the season. I thought that since we are trying to predict a single day, asking past stats to predict a month was being generous, but definitely agreed that repeating the analysis would be a good idea. So here we are, approaching the midway point and the All-Star Break. What better time to revisit this question?
|OPS vs LHP, top 150 of 2014 vs their 2013 OPS|
|OPS vs RHP, top 150 of 2014 vs their 2013 OPS|
What you see in these graphs is the OPS rank in 2014 plotted against the same player's rank in 2013 for the top 150 players in 2014 vs either LHP or RHP. A perfect correlation would start at zero and extend up with a slope of 1. With more data, I do see a much better correlation for hitters vs LHP now than I did in May. There is still no correlation for batters facing RHP between last year and this year. This is interesting, because most hitters have had significantly more plate appearances vs RHP than LHP. One would expect the data set with the larger sample to show a better correlation.
What you don't see in the graphs above is that about 50% of the players ranked in the top 150 OPS in 2014 were not ranked in the top 150 in 2013 (72/150 n.r. for LHP and 79/150 n.r. for RHP). That's a lot of missing sample. It means that a lot of players are doing well this year that have not done as well in the past. Likewise, there must be an equal number of guys that did well last year that are struggling more this year. That is the nature of the game. Situations change with team, park, and/or lineups, even talent changes as players are maturing into their full potential or declining at the end of their careers.
The experts and professionals in the DFS industry largely use multiple years of data, and weight it with somewhat of a recency bias so that recent performance counts more than distant past performance. They acknowledge that the raw wOBA or OPS stats need to be manipulated to achieve their best predictability, and don't disclose the nature of the manipulations for obvious reasons.
My friend Drew Dinkmeyer (@dinkpiece) made a great point to me yesterday. He reminded me that the metric you use to predict performance for DFS doesn't have to meet some predictive threshold to be useful...it only has to be better than the opponent's metric. As a scientist, who is always striving to optimize my process, it has been easy for me to lose sight of the fact that DFS is a game we play against other people. Mike Leone (@leonem4444) added that people forget how small the margins are to be a successful DFS player. Using the metric that provides even the slightest advantage matters in the long term. When players of their caliber give advice, I tend to listen.
Yet hitting performance is not very consistent across years, as my very simple analysis shows. So what to do? I guess I give slightly less weight to reams of past hitting performance data than others might when building lineups. As noted above, I don't know how the experts weight the different variables in their algorithms, so it might actually not be less. I rely a lot more on some of the other factors I mention above, like park, opposing pitcher, spot in the lineup, and Vegas lines. I let OPS or wOBA help me zero in on specific players that I already know have one or more other factors in their favor.
As always with me, the bottom line is to think about why you're doing what you're doing with your lineups. Make sure you can justify using a player for multiple reasons, if possible. And when you hear that one or another statistic is the way you HAVE to analyze players, remember that no one metric can tell the whole story.