There are a few ways to access how good a team is. In this post I use Partial Least Squares Path Modelling to construct a team success index for NBA teams during the 2015-16 Playoffs.
PLS-PM can be described as a tool to analyse the relationship between blocks of variables, taking into account some previous knowledge about the phenomenon observed. In most competitive team sports like basketball success depends on a large number of variables. However, most of these variables can be grouped under two major blocks: offense and defense.
The underlying theory is: as a team improves its offense it will lead to more wins and as it improves its defense it will avoid losing games, therefore improving success.
In this example I use data scraped from nba.stats.com using the same method described in my previous post about plotting shot data. The data contains offensive and defensive stats for the 16 teams that played the 2015-16 playoff series, from the first round to the game 6 of the final series.
In order to run PLS Path model and produce success indices I will use the R package plspm
.
library(plspm) library(colortools) # load team data pre game 7 load('Team_data_pre_game_7_finals.RData') # rows of the inner model matrix Offense = c(0, 0, 0) Defense = c(0, 0, 0) Success = c(1, 1, 0) # path matrix created by row binding nba_path <- rbind(Offense, Defense, Success) # add column names (optional) colnames(nba_path) <- rownames(nba_path) # plot the path matrix innerplot(nba_path)
After running the code above we have a visual representation of the Partial Least Squares Path Model.
Is this scenario, offensive efficiency is influenced by field goal percentage, 3-point shots made percentage, free throw percentage, offensive rebounds, assists and number of personal faults drawn. On the other hand, defensive efficiency is influenced by defensive rebounds, steals and blocks. Success is defined by the number of wins, win percentage, points and plus/minus differential.
Next we will define what variables are associated with exogenous variables (offense and defense) and the endogenous variable (success) and run the model:
# define list of indicators: what variables are associated with offense, defense and success nba_blocks <- list(c(8, 11, 14, 15, 18, 24), c(16, 20, 21), c(2, 4, 25, 26)) # all latent variables are measured in a reflective way nba_modes <- c('A', 'A', 'A') # run plspm analysis nba_pls <- plspm(teamData, nba_path, nba_blocks, modes = nba_modes) summary(nba_pls) ## Model Results # path coefficients nba_pls$path_coefs # inner model nba_pls$inner_model # plotting results (inner model) plot(nba_pls)
After running the code above, we have coefficients for offense and defense. As the command summary(nba_pls)
suggests, the total effect of all offensive variables (summarised under the offense block) on the success of a NBA team during the playoffs is 0.6512 and the total effect of defensive efforts on success is 0.35. At this point I asked myself: “Is defense the best offense?”. The data suggests that, no, it is not.
Next lets inspect how important each of the 9 variable is for total offense and defense by checking the model loadings. Loadings are a numeric value that represents the strength and the directions of the relationship.
# plotting loadings of the outer model plot(nba_pls, what = 'loadings', arr.width = 0.1)
Three point field goal percentage and field goal percentage are the most important drivers of an effective defense. Note how offensive rebounds loading is negative and this may seem counter-intuitive at first. But, if a team has a high number of offensive rebounds, it also means that players are missing field goals (the most important drivers for offense). So, the less offensive rebounds a team has the better it is offensively. Personal faults drawn and free throw percentage are weak factors for in a good offense.
According to the model a good defensive team during the playoffs will have strong blocks and defensive rebounds numbers. A little far behind are steals, not as important as the first two.
Success as defined earlier is a set variables rather than one dependant variable. According to model loadings all four variables are highly significant. Plus/minus differential loading is equal to 0.934 and it means by how much a team win/loses games. Win percentage loading is equal to 0.939 followed by average points (0.9097) and number of wins (0.875).
Loadings greater then 0.7 are acceptable and loadings smaller than 0.7 should be removed or revised in order to show a direct positive correlation with the success measures.
A success index can be obtained by running the code below which will also produce an offense index and defense index.
Just before the last game of the finals, the Cleveland Cavalier had the highest success index followed by the Golden State Warriors. This is mostly because the “easier” path the Cavaliers had during the playoffs in contrast to the harder games played by Golden State, specially against Portland and Oklahoma City.
# index of success print(nba_pls$scores) Offense Defense Success Atlanta Hawks 0.4629287 1.592191e+00 0.167262133 Boston Celtics -0.9647560 9.129464e-01 -0.475286804 Charlotte Hornets -0.7883822 -1.298748e+00 -0.551335436 Cleveland Cavaliers 1.1908128 9.089896e-03 1.624594362 Dallas Mavericks -0.2207792 -2.229073e+00 -1.083904601 Detroit Pistons 0.8058766 -1.198733e+00 -0.967788741 Golden State Warriors 1.3680756 5.504884e-01 1.494728477 Houston Rockets -1.6518981 2.925322e-01 -1.219776426 Indiana Pacers 0.7164425 4.080424e-05 -0.036057595 Los Angeles Clippers -0.5791455 8.103359e-01 -0.003777228 Memphis Grizzlies -1.7787707 -9.772018e-01 -2.063751504 Miami Heat 0.1292790 4.524869e-01 0.415106863 Oklahoma City Thunder 0.3947127 7.548823e-01 1.209605335 Portland Trail Blazers 0.3355825 -5.579746e-02 0.437097616 San Antonio Spurs 1.5250650 1.110722e+00 0.931435143 Toronto Raptors -0.9450437 -7.261632e-01 0.121848406
The code can be found in my GitHub repo. This book by Gaston Sanchez served as useful resources for this post. He used data from the Spanish football league and brings a complete explanation about PLS Path Modelling.