NBA All-Star Predictive Modeling

 Getting the Numbers to Agree with the Voters (and Vice-Versa)

The NBA All-Star Game – one of the most exciting, interesting, and contentious accolades awarded every season. To be recognized as an All-Star shows that you truly made it in the league, and that your worth as a top-tier basketball player in the world has been recognized. Or does it?

All-Star selection is historically a pretty arbitrary and subjective process. Media members, former players, and fans all have a say in selecting the top 12 players from each the Western and Eastern Conferences which from the surface may seem like a rather holistic and unbiased selection method. But how do these people make the decisions that they do? Is voting a popularity contest/social media content driven (think Alex Caruso, who had almost as many All-Star votes as Chris Paul this season)?  Or is it biased towards more successful teams while leaving out players on less successful ones (think Draymond Green, who, although a great player in his own right, arguably may not have deserved all of his All-Star nods from 2016-2018).

Hopefully, the media voters, widely regarded as the most intelligent spectators on the game of basketball, will balance out any popularity picks that are made year-over-year. Regardless, how are they making their decisions? Are their picks solely based off the eye test and surface-level numbers? Will different media members truly make selections in a uniform, non-arbitrary method year-over-year? Every year, there seem to be egregious All-Star snubs that many agree are unrealistic, and so one may wonder how these media experts consistently “mess up”.

This made me wonder if a more data-driven approach could be applied to All-Star selections by showing if certain numerical patterns can be seen amongst All-Star selections each year. Are there certain statistics that correlate with players being selected as All-Stars, and how accurate are these models in predicting All-Stars in a season? If you have read this far into my blog post, I assume you are excited as I was to find out.

 

Building the Model

              In order to build out such a predictive model, I needed vast historical player data (of both All-Stars and non-All-Stars). By scraping basic and advanced per-game metrics on all NBA players from 2000-2016 (via basketball-reference.com using full season’s worth of player data), I obtained a 10,000+ row dataset from which I could train a machine learning statistical model. After extensive data cleaning (here is my methodology and source code for anyone interested), I built out two logistic regression models and a large-tree model to rank the probability of a given player being an All-Star based on their stat lines (I found my predictions to be more accurate when weighting all three models in combination). When validating this data from 2017-2021 All-Star selections, this predictive technique has proven to be 92% accurate. Further, the logistic regression model portions each boast over 95% sensitivity in correctly selecting All-Stars based on the historical training/validation data. Here are some of the significant stats in my models as the most relevant to All-Star selection:

·       Player Efficiency Rating

·       Win Shares per 48 minutes

·       Points per Game

·       Age (this was rather interesting to me)

·       Rebounds per Game

·       Offensive Box Plus/Minus

·       Effective Field Goal Percentage

·       True Shooting Percentage

·       Fouls per Game

·       Blocks per Game

·       Three Point Attempt Rate

·       Free Throw Rate

·       Turnovers per Game

·       Two Pointers Made per Game

My model seemed to heavily favor offensive threats who are efficient, well-rounded, and take a lot of threes per game while making a lot of twos. Realistically, this could encompass almost any type of player, though. Here are my model’s predictions from 2017-2020 vs actual selections (note – this model does not take the position limits for starters/reserves into account):

___________________________________________________________________________________ 

Takeaways:

·       Predicted 92% of 2017 All-Stars within top 15 per conference  model rankings

o   West: Damian Lillard was ranked the 14th highest probability of all players to be selected whereas Draymond Green was ranked 49th

o   East: Bradley Beal was ranked 29th whereas Paul Millsap was ranked 42nd


___________________________________________________________________________________ 


Takeaways:

·       Predicted 96% of 2018 All-Stars within top 15 per conference model rankings

o   West: Chris Paul was ranked 20th highest probability of all players to be selected whereas Draymond Green was ranked 48th and Klay Thompson was ranked 60th (!!)

o   East: The Eastern conference was predicted with 100% accuracy and the starters were the 5 most likely players to be selected


___________________________________________________________________________________

 

Takeaways:

·       Predicted 92% of 2019 All-Stars within top 15 per conference model rankings

o   West: Rudy Gobert was ranked 15th highest probability of all players to be selected whereas Klay Thompson was ranked 46th

o   East: Andre Drummond was ranked 29th, and Eric Bledsoe was ranked 38th (Khris Middleton was ranked 39th and Victor Oladipo 44th)



___________________________________________________________________________________ 

 

Takeaways:

·       Predicted 88% of 2020 All-Stars within top 15 per conference model rankings

o   West: Karl-Anthony Towns was ranked 10th highest probability of all players to be selected whereas Chris Paul was ranked 33rd and Rudy Gobert was ranked 36th

o   East: Bradley Beal was ranked 11th and Kemba Walker was ranked 35th

-  Interestingly enough, Kemba Walker was the only starter that my model did not predict to be selected (more on this later)


___________________________________________________________________________________ 

 

And without further ado, here are my 2021 All-Star predictions vs the actual results:

Takeaways:

·       Predicted 92% of 2021 All-Stars within top 15 per conference model rankings

o   West: Brandon Ingram was ranked 23rd and De’Aaron Fox was ranked 25th whereas Chris Paul was ranked 34th and Devin Booker was ranked 40th

o   East: Trae Young was ranked 8th highest probability of all players to be selected whereas Ben Simmons was ranked 32nd


___________________________________________________________________________________ 

 

Players Above/Below Model Expectation – Model is not Perfect (yet!)

Some players the model liked less than expectation:

Given a 20.2% chance to make All-Star Game in 2019 according to the model, Klay Thompson in 2019 did not land within the top 10 amongst eligible guards in PER (32nd), WS/48 (51st), PPG (11th), OBPM (36th), or eFG% (20th), though registering 10th in BPG. Although a premier shooter and wing defender, Klay’s 2019 output may not truly have truly been All-Star worthy.

Although the 14th best amongst guards in Win Shares/48 and 8th in terms of OBPM, Kemba Walker’s stats in 2020 were otherwise less than All-Star Game starter-worthy according to the model’s predictions, ranking 20th in PPG with an eFG% of only 38th-best. Thus, he was likely valued as a non-All-Star because of his lack of high-level scoring for a guard in terms of average points and efficiency.

These two players are two of the more surprising false negative picks by my model. Each of them was championed by media as premier players in the respective seasons, so what gives?

Possible reasons:

                Both players played on very winning teams these seasons, and so the media favored them above other/more worthy players as great players on star teams. Were these players overrated slightly by the media based on the teams they were playing for? Is there recency bias amongst voters since both players were All-Stars the previous season?

                These players could be anomalies statistically as compared to the model’s predictions. The stats above show both players being potentially undeserving of All-Star nods, yet not all players may always fit one mold/model of statistics. The model highly values offensive efficiency and output but maybe these players are not best represented in terms of value based off of these numbers alone. As offense is more highly valued in the model than defense, Klay Thompson’s elite play on both sides of the ball as a SG may get less recognition in it. This model also can not necessarily account for a player’s value to its teammates (such as Kemba Walker to the Celtics, as many see him as the glue guy to lead their explosive scoring offense led by Jayson Tatum and Jaylen Brown).

Players the model liked above expectation:

          Bradley Beal (given 83.1% chance to make All-Star Game in 2020) was the 2nd highest backcourt scorer in the league in 2020, coupled with extremely high efficiency (7th in PER and 27th in TS%), Beal was an elite offensive threat.

          Similarly, Trae Young (given 96.7% chance to make All-Star Game in 2021) has been a top 10 guard in terms of PER, PPG, and OBPM, while also shooting the ball at top 30-level TS% at the PG position, it is undeniable that his scoring has matched his efficiency statistically thus far in the 2021 season. This model would lean towards hyper-effective offensive weapons based on historical data (as Young is the only player to not get selected to the ASG when averaging over 26 PPG and over 9 APG). The model even went as far as to rank him the 3rd most likely player to make it in the Eastern conference which undoubtedly makes this the biggest snub of this year’s All-Star Game according to the model. 

Possible reasons:

              In contrast to Kemba Walker and Klay Thompson’s aforementioned selections, Trae Young and Bradley Beal played on teams who were not championship contenders at the time of ASG selection, which may further denote team success bias in selecting All-Stars. Have the fans and media more recently valued players on non-contending teams less than championship contenders, and is this a newer trend that was not picked up in the training data from 2000-2016? Do voters value defense more than expected at times (as both players are below average wing defenders)? 

             The model unlocks some interesting findings about All-Star selection and voting patterns. Regardless, what can we conclude about player’s value/rank as star players based on this model?

   It is obvious that voters highly value high-volume offensive scorers who are also efficient. The Golden State Warriors have proven in the last few years that overpowering offensive threats are valuable assets that can lead teams to championships. However, the championship Warriors also provide an apt microcosm of player value aside from statistics like offensive scoring and efficiency. Players like Draymond Green and Klay Thompson, who the model snubbed for All-Star selections once each, respectively, undeniably played major roles on these teams winning at the highest level. How should we weight offensive output versus defensive efficiency, and can we really quantify valued traits in players like leadership, effort, and other values that players bring to team chemistries? Maybe a player ranking based off all-NBA/MVP training data based on the most important metrics per position would engender a more valuable scoring system (foreshadowing, anyone)?

  Whether the model illustrates player value or popularity, it predicted All-Star selections at over 92% accuracy both on training data and new data. To see how/if the model predicts future success, I will leave everyone with a ranking of players 23 years old or under who ranked within the top 75 in terms of All-Star selection probability for 2021:

 

         Can these young players/”young bulls” like Collin Sexton make the jump to All-Star caliber in the coming years? If it were up to me, I would let the numbers decide. But hey, by the time these guys take over the league, maybe they will. 

Comments

Popular posts from this blog

NBA Betting Model - Beating the House

Making the Perfect March Madness Bracket – An (Impossible) Tradition Unlike Any Other