So if we did no analysis at all, and just always guessed that the batter made contact (which is a terrible prediction model), we would be right 79.6 percent of the time. In our data set, the batter made contact 79.6 percent of the time. But we have to look at the context of the situation. How often does a batter swing and miss? If you watch baseball regularly, then you know that most of the time the batter makes contact. Not bad, perhaps? Surely, 81.0 percent is better than if we just flipped a coin to determine the result. By using these steps, we get an accuracy of 81.0 percent. Since the model returns an output probability between 0 and 1, we will declare that the batter missed if the output probability is greater than or equal to 0.5, and that the batter made contact if the probability is less than 0.5. We will use the convention of “success” and “failure” from the pitcher’s point of view. This means that a “success” is the batter missing, while a “failure” is the batter making contact. Note that we could have just as easily defined “success” and “failure” from the batter’s point of view, and we would have minimal changes in the problem and code. Let’s start with the logistic regression model. We’ll randomly partition our data (about 1.2 million pitches) into an 80 percent training set and a 20 percent verification set. The simplest possible operation is to create a single model out of all of the pitches in the training set, and then use that single model to predict the outcome of all the pitches in the test set. Due to the effects of this random partitioning, it’s wise to repeat the calculations several times using different random partitions, and average the results together (although with a large data set of 1.2 million pitches, the final result is unlikely to change very much). Once the batter puts the ball in play, anything can happen Batting Average on Balls in Play (or BABIP) demonstrates that once a batter puts a ball in play, whether he gets a hit is highly dependent on luck. Conversely, avoiding swings and misses is very important for the batter. Strikeouts are one of the statistics that the pitcher controls (almost) entirely, and represent an important component of Fielding Independent Pitching (FIP) metrics. Why is this exercise useful? For one thing, strikeouts are very important for the pitcher, and 75 percent of strikeouts occur on swings and misses (as opposed to a called third strike). For more detailed information, I invite you to read the Wikipedia articles that are referenced. All you need to know for now is that both of these models take continuous inputs (such as pitch velocity) and categorical inputs (such as pitch type), and then output the probability estimate between 0.0 and 1.0 that a particular binary event occurred (in this case, whether the batter missed or not). Regarding the logistic regression and random forests models, don’t worry if you’re not particularly familiar with them. If you’re reading this article, you’re likely familiar with the individual PITCHf/x statistics, so I won’t explain them here. Note that in all the PITCHf/x data we are analyzing, we will consider only pitches on which the batter offered at the pitch, and we are not considering the count in which the pitch is thrown. To do so, I’ll use PITCHf/x data from the past few seasons, and employ two different models: logistic regression and random forests. It’s that simple question - one that characterizes the most basic element of the game - that I’d like to answer in what follows. For example, a question like: On any given offering from a pitcher to a batter, how likely is the pitcher in question to record a swinging strike? But even beginning with such a simple premise, it’s already possible to ask questions that require considerable rigor to answer accurately. ![]() The batter wants to hit the pitcher wants him not to hit. ![]() That’s really the essence of the game at its most basic level. The batter, meanwhile, wants to swing and make contact with the pitch.” (via M&R Glasgow)Īlthough the minutiae of baseball is frequently analyzed in painstaking detail, at some level it remains a simple game. When teaching the game to kids (or in my case, some European and Asian friends who have never seen the game before), we likely begin with an explanation like this: “The pitcher throws the ball to the batter, and he wants the batter to swing and miss at the pitch. Pitchers who can induce swinging strikes are more valuable than those who cannot.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |