Amongst NBA fans and players alike, fan attendance is often a matter of much back-and forth. Diehard NBA supporters often lament fanbases of other franchises for being fair-weather or fickle in their loyalty, while platitudes such as “we have the best fans in the world” or “our fans stick by us no matter what” are a common feature of post-game press conferences and preseason rallies. While such commentary has largely become synonymous with NBA culture as a whole, however, the manner in which preseason hype, variability in team success, or autonomous factors such as location relate to regular season attendance–not only on special, nationally televised occasions, but on a daily basis–is rarely, if ever, investigated. While many NBA aficionados may like to believe that their respective NBA fanbases are superior to those of others, for whatever special reason, the truth is that many exterior, overarching variables—ones largely out of the hands of any regular, everyday NBA fan—significantly impact the rate of attendance throughout the NBA. As such, after speculating as to the identity of these factors and the manner in which they dictate NBA attendance, I chose four particular variables—projected wins, regular season wins, market size, and recent playoff success—and plotted them against fan attendance for each and every team, as a means to investigate how highly each correlates with the rate at which NBA fans tend to turn out.
The following table displays all of my potential explanatory variables, along with my response variable. In order, “Projected_Wins” denotes each team’s projected win total (which I gathered via ESPN, the self-proclaimed worldwide leader in sports), “Real_Wins” denotes each team’s regular season win total, “Market” denotes each TV market’s population (via Nielsen), and “Recent_Playoff_Ws” denotes each teams playoff win total in the past five years. Lastly, “Average_Attendance” (via ESPN), the response variable, denotes each team’s average attendance per 82 games. Accordingly, all of such data refers to the 2014-15 regular season:
The following is a multiple regression model, using all of the aforementioned variables to produce a system by which average fan attendance, for each team, can be predicted:
The equation for this model is: AverageAttendance= 12230 + 0.0001277Market + 81.85ProjectedWins + 28.1RealWins + 20.93RecentPlayoffWs. Furthermore, the standard deviation of the residuals is 1369.7. These mean that, using such a model, each additional member of a population within a market increases the average attendance by 0.0001277, each additional projected win increases the average attendance by 81.85, each additional real win increases the average attendance by 28.1, and each additional recent playoff win increases the average attendance by 20.93. However, given the standard deviation of the residuals, such predictions (in which average attendance is determined via the aforementioned explanatory variables) will typically be off by 1369.07, what comes out to about 15% of most team’s average attendance marks. Knowing all of this, it is possible to calculate a predicted average attendance value for a particular team. As such, take the Lakers: their projected number of wins was 30, actual number of wins was 21, market population was 14,251,000, and number of recent playoff wins was 25. Given all of this, their multiple regression equations looks as such: Average Attendance= 12230 + 0.0001277(14251000) + 81.85(30) + 28.1(21) + 20.93(25) = 17,618.70. Thus, their residual: 18,737 – 17,619 = 1,118, a bit lesser than it typically will be. Essentially, the model predicts the Lakers to have 1,118 less fans per game than they actually did.
However, despite all of this, it becomes quite evident that the multiple regression model can be put to better use if some explanatory variables, because they overlap with more accurate predictors in certain respects, were omitted. In order to find out what the perfect combination of explanatory variables was, I inserted and omitted certain variables. First, I tried Market, Projected Wins, and Recent Playoff Ws, for which the equation was: AverageAttendance= 12470 + 0.0001118Market + 107.2ProjectedWins + 18.86RecentPlayoffWs, and the standard deviation of the residuals was 1366.94. Then, for Market, Projected Wins, and Real Wins, the equation was AverageAttendance= 12260 + 0.0001199Market + 93.44ProjectedWins + 24.09RealWins, and the standard deviation of the residuals was 1377.67. For Market, Real Wins, and Recent Playoff Wins, the equation was AverageAttendance= 13170 + 0.0001566Market + 78.71RealWins + 32.8RecentPlayoffWins, and the standard deviation of the residuals was 1479.56. Lastly, for Projected Wins, Real Wins, and Recent Playoff wins, the equation was AverageAttendance= 13150 + 8.537RealWins + 15.96RecentPlayoffWs + 99.72ProjectedWins, and the standard deviation of the residuals was 1473.94. Of these, the trio of Market, Projected Wins, and Recent Playoff Ws produced the lowest standard deviation of the residuals (only being approached by the Market, Projected Wins, Real Wins trio), ultimately proving that of all three-explanatory-variable models, this was the best at giving the most accurate predictions for average attendance per team. This also effectively conveyed to us that “Real Wins,” each team’s 2014-15 regular-season win total, was in fact the least useful of all the explanatory variables in predicting average attendance.
In addition to this, however, I went a step beyond in an effort to find the most accurate two-explanatory-variable model and see if any, despite including one less predictive variable, would prove to be as effective as the models that used a trio of variables. In doing so, I found that only one model came at all close to producing a standard deviation of the residuals as low as the prior ones did. This was a model in which the explanatory variables were Market and Projected Wins, as its standard deviation of the residuals value was just 1369.4. Essentially, this proved to me that the next least useful explanatory variable was “Recent Playoff Wins” and a multiple regression model relying on just Market and Projected Wins could make predictions about as effectively as one using Recent Playoff Wins in addition to the two.
As far as the investigational process, my model surely had a few limitations. For one, all data upon which it was based was taken from the span of just one season (2014-15) and, thus, the sample size of teams (30) across that one year was not particularly large. Furthermore, the scope of my model’s predictions was based upon just four explanatory variables, and there are undoubtedly other variables that may have predictive power superior to that of the ones I used. As such, my model’s standard deviation of the residuals (around 1360 for most combinations of variables) could have certainly been lesser with such superior variables. As far as the four my model specifically dealt with, however, it became clear that pre-season projections and individual market sizes had the most to do with the rates at which fans came out to games. Ultimately, my model was not perfect, but it did serve to show that fan attendance for NBA teams depends on various, often autonomous, variables—most of which are simply out of both my and your hands.