About ballmart

Introduction

The main objective of the ballmart project is to predict the outcomes of college basketball games via a methodology similar to other popular college basketball analysis sites such as KenPom. This is done through the creation of “context-independent” ratings used to create the game predictions.

This project has taken a number of iterations over the years; more about that on my personal website.

More about specific analyses, parts of the model, modeling processes, and behind-the-scenes things specific to ballmart can be found on the blog page.

Ratings

The core of the ballmart process is to use data from past games to generate ratings. The objective of the ratings is to make a context-independent set of parameters for each team for use in predictions.

Per-possession

One observation that can be made from watching basketball games is that some teams “push the pace” more than others. This can take several forms, but most notably it consists of shooting the ball earlier into each possession. Other teams, in contrast, will hold onto the ball longer into the possession and shoot the ball closer to the expiration of the 30-second shot clock.

This has a significant impact on the number of “raw” points scored by a team in a game. For sake of example, let’s say that both teams in a game make 50% of their shots, all of the shots are 2-point shots, and they are taking all 30 seconds of the shot clock before taking a shot. This would result in each team having 40 possessions per game (20 per half) and scoring 40 points each throughout the game (40 possessions _ 50% chance of making shot _ 2 points per shot). In another example, let’s say that each team shoots the ball 10 seconds into each possession (still with a 50% chance of making each 2-point shot). This would result in each team scoring 120 points per game, since each team would have three times the possessions as in the first example.

Thus, focusing on “raw” points in this manner would be deceptive when calculating the expected number of points scored by a team in a game. What if a team in the first example above played a team from the second example? Aren’t the two teams actually equally as efficient (scoring at the same rate per possession; 50%)? This is the justification for pursuing a per-possession efficiency metric basis, rather than a “raw” points metric basis.

Garbage Time

Another observation from watching games is that not all possessions are created equal; at some points of the games (especially when games are getting out of hand, with one team having a significant lead), teams often put in their “reserve” players or begin employing a suboptimal and/or “reckless” strategy.

Since we want to evaluate the teams’ performance in a predictive manner, and the use of reserve players is situationally-dependent (not all games get “out of hand” to the point that reserve players are used, and instead the “main” roster will be used all games), we want to remove possessions in which teams may not be playing to their full potential.

These “not full potential” portions of the game are often referred to within sports as “garbage time”. I make a conscious effort to remove possessions where a team has a statistically “safe lead”. This (ideally) prevents possessions where teams may be employing a strategy other than their standard one(s) from influencing the calculations.