When Major League Baseball Advanced Media (hereafter MLBAM) presented at the Sloan Sports Analytics conference this year (hereafter SSAC) they demonstrated a new tracking system for baseball action. It included a brief video component. I have included it here.
The video is 57 seconds long, and only the middle third of it is any different than the usual MLBAM highlight clip. The difference in those 20 seconds is all the difference in the world. The data represented on-screen by all those flashy graphics is new, earth shatteringly, game-changingly new. It represents an information channel that has never been available before. Front offices, television broadcasters, scouts, sabermetricians, fans; none of these groups has ever had access to so complete a view of a baseball play in action.
Why is measuring what’s going on once the ball is hit so important? It’s because fielding is the last area of baseball analysis to be brought into greater focus. For many years, the role of the pitcher has been broken down into pitch types and velocities. The results of a particular pitcher were found, in the early 1980s, to be heavily dependent on the ability to rack up strikeouts, avoid walks, and limit home runs. Since 2007, the PITCHF/x system has brought the accuracy level of pitch analysis to the 10th of a mile per hour, and fraction of an inch of location. In a similar fashion, hitters have been looked at in the isolated format of the at-bat. Their level of success in making contact, reaching base via walk, and ability to rack up singles, doubles and home runs has all been obvious since the first box score was drawn up. The PITCHF/x and hit tracker systems have allowed us to calculate a batter’s proficiency at both taking the borderline pitch and taking advantage of the one grooved down the middle.
Once the ball leaves the bat, all that accuracy goes out the window. With the currently available systems, nothing comparable to pitch f/x exists. Every field is different in baseball, and every fielder starts in a different position on every pitch. Fielding metrics are like viewing the world through frosted glass. Systems like DRS, TotalZone, and Inside Edge all take estimates about what is supposed to happen, an try to match it to what likely happens, and then decide whether or not what actually happened was better than expected. Things get foggy, especially since we don’t know, with any certainty, the field situation at the start of the play, or the duration of the play. We all know how it turned out, but don’t really know exactly what got things to come out that way. Frosted glass.
The system revealed at SSAC shatters the glass and lets us look at the whole field, with the same kind of granular precision that we currently look at the strike zone. People have noticed. Rob Neyer, was at the SSAC presentation, and he wrote in his notebook:
NOTHING WILL BE THE SAME”
And I tend to agree with him.
There are a whole lot of angles one can take with the data. At the most straightforward level, it will be able to work backwards to determine if a play was ‘makeable’ in any situation by applying real numbers to the following factors.
- How long did it take for the runner to reach the base from time of contact?
- What was the reaction time of the fielder to the ball?
- How long was it until the fielder’s range and the ball’s location overlap?
- How efficiently did the fielder reach that point?
- How long did the fielder take to release the ball from his point of contact?
- How much time did it take the ball to arrive at the end of the throw?
From these numbers, we can see if there was ever an opportunity for a fielder to cover the distance and time required to complete the play. We can also see the potential for an ‘apples to apples’ comparison of each fielder. Substituting Brett Lawrie’s reaction time and Adrian Beltre’s route efficiency, and Manny Machado’s throw velocity in and out, we could pinpoint when a play breaks down, or how it comes together.
Ben Lindbergh noticed the SSAC presentation, too. He talks at the end of his full breakdown about other potential applications. Most of these relate to scouting and fan analytics, but I can see the potential for players to find more benefits than they currently imagine. If a player is the most efficient baserunner in the league, would his teammates ask him how he lines up for his turn around second base? If a defender is slow, but has the shortest average distance to field the ball, does his team look at how he anticipates the play better than the other players around him?
The potential here is so broad as to be impossible to guess what direction this new look at the game takes us. It’s so new, and so raw, it doesn’t even have a name yet. Along with all this untapped potential, there are, of course, obstacles. The first obstacle is understanding exactly how much is a smoke and mirrors mock-up and how much is ready to put into place. Good news! Ben Lindbergh gives us the money quote on this one.
“The numbers in the Heyward video, though, are the real thing—and since that play took place in July, we know that this presentation was in the works for a while. “Those are actual calculated data points from the plays…not mock-up values,” says Cory Schwartz, VP of Stats at MLBAM. “That thing’s operational,” added Lando Calrissian.”
The second obstacle is another crazy one to contemplate. It relates to how much data the system has to gather in order to capture the game environment. There is a system of cameras and radar which co-ordinate the ball movement and fielder actions. They record constantly from first pitch onward, including warm ups and pitching changes. When the creators of the system tally up the data points, the number they’ve given out so far is seven.
Seven terabytes. Every single game.
All of which needs to be stored, processed, and distilled into something that looks like a baseball field. This is not a project that you can take home and scribble in the corners of, like a box score. This is mainframe computer stuff. Except, I don’t think they have mainframe computers anymore. Even with today’s online access to databases and servers, terabytes are not the kind of thing that can be examined quickly or lightly.
I will grab another block of text from the Lindbergh article regarding what kind of data might be available.
“As BP Director of Technology Harry Pavlidis described it to me, there are four levels of data processing involved in a project of this scope:
1) Raw Data
2) Calculation-Friendly Data
In this case, level one corresponds to raw camera/radar information, or pixel data. Level two, which Trackman would presumably provide to MLBAM, is a usable stream of information: timestamps, events, player IDs. Level three translates the information from step two into something more easily understood: “Player X took Y seconds to get to the ball.” Finally, level four adds insight to the level-three info: “Player X took Y seconds to get to the ball, and his route was inefficient.” The numbers in the Heyward video are a mix of level three and level four.
Ideally, we’d get a data dump of level-two info, as we have for PITCHf/x. With that information, any avenue of analysis would be open. But it’s possible that we’ll have to settle for seeing some pro-processed level-three and –four output in GameDay or on TV broadcasts instead of doing any number-crunching as a community. While more graphics like those in the Heyward video would undoubtedly make broadcasts better, we might see some stagnation in online analysis if that’s where the access ends—especially if PITCHf/x is phased out in favor of the new system after 2014, in which case we might have less access to data than we do now, and an even greater knowledge gap between front offices and fans than already exists. This could go either way.”
So there is hope for all the database wizards out there, that somewhere between unmanageable and over-processed, lies a happy medium for the information that gets released. I will keep my fingers crossed for them.
What if you aren’t a mathamagician though? I still think there’s all kinds of cool and interesting stuff that could come out of the new data. Wouldn’t we all like to see when a fielder is making a dive for the sake of the cameras? This system could overlay the typical path to the ball with one that we suspect of being a ‘TV dive’ for the sake of making a cool looking play. Wouldn’t we all like to know how a play would turn out if a fielder didn’t make that one first step in the wrong direction?
At the end of the year, we will have a dozen different categories to compare players across. Imagine a leaderboard of ‘Best First Move’, ‘Most Efficient Route’, and ‘Quickest Release’ for fielders. Or ‘Quickest Jump’, ‘Biggest Leadoff’ and ‘Fastest from Second to Home’ for baserunners.
To the teams who will undoubtedly invest heavily in making this system work, the focus will not be on such amusing comparisons. This will be about spending money more efficiently. A player’s reputation as having a great arm, or good footwork, or great base stealing skills will not be relevant. If the data shows these things to be true, players will get paid. If it does not, sorry, you can’t argue with facts. It will also, potentially reward players for processes, not results. Gabe Kapler talks about the veil being lifted on these concepts. Organizations will be able to look for skill sets, not sets of results, and be more patient with players who demonstrate good skills, but maybe not the best luck in their first stint in the league.
That’s great for the statheads, and for the people who spend the money to put product on the field. I’m not really in either of those groups. To me, most importantly, as a fan, it will give credit for plays where credit is due. If a player has good positioning, it will show in his data. If a player gives up on plays because he knows that he can’t get to the ball in time, we will be able to acknowledge his baseball smarts. If a player improves his route efficiency year over year, we can appreciate a real improvement. If a player has lost a step, we will know, for real, and he will too.
This system, whatever it will be called, has been a dream for those who want to make the game completely transparent. It has been only a dream for several years now. We are on the doorstep of that dream becoming a reality. It will be glorious, and it will have growing pains, but it is only going to become more integral to the game.