Success index of NBA teams using PLS Path Modelling

There are a few ways to access how good a team is. In this post I use Partial Least Squares Path Modelling to construct a team success index for NBA teams during the 2015-16 Playoffs.

PLS-PM can be described as a tool to analyse the relationship between blocks of variables, taking into account some previous knowledge about the phenomenon observed. In most competitive team sports like basketball success depends on a large number of variables. However, most of these variables can be grouped under two major blocks: offense and defense.

The underlying theory is: as a team improves its offense it will lead to more wins and as it improves its defense it will avoid losing games, therefore improving success.

In this example I use data scraped from using the same method described in my previous post about plotting shot data. The data contains offensive and defensive stats for the 16 teams that played the 2015-16 playoff series, from the first round to the game 6 of the final series.

In order to run PLS Path model and produce success indices I will use the R package plspm.


# load team data pre game 7

# rows of the inner model matrix
Offense = c(0, 0, 0)
Defense = c(0, 0, 0)
Success = c(1, 1, 0)

# path matrix created by row binding
nba_path <- rbind(Offense, Defense, Success)

# add column names (optional)
colnames(nba_path) <- rownames(nba_path)

# plot the path matrix

After running the code above we have a visual representation of the Partial Least Squares Path Model.

Is this scenario, offensive efficiency is influenced by field goal percentage, 3-point shots made percentage, free throw percentage, offensive rebounds, assists and number of personal faults drawn. On the other hand, defensive efficiency is influenced by defensive rebounds, steals and blocks. Success is defined by the number of wins, win percentage, points and plus/minus differential.

Next we will define what variables are associated with exogenous variables (offense and defense) and the endogenous variable (success) and run the model:

# define list of indicators: what variables are associated with offense, defense and success
nba_blocks <- list(c(8, 11, 14, 15, 18, 24),
                  c(16, 20, 21),
                  c(2, 4, 25, 26))

# all latent variables are measured in a reflective way
nba_modes <- c('A', 'A', 'A')

# run plspm analysis
nba_pls <- plspm(teamData, nba_path, nba_blocks, modes = nba_modes)

## Model Results
# path coefficients

# inner model

# plotting results (inner model)

After running the code above, we have coefficients for offense and defense. As the command summary(nba_pls) suggests, the total effect of all offensive variables (summarised under the offense block) on the success of a NBA team during the playoffs is 0.6512 and the total effect of defensive efforts on success is 0.35.  At this point I asked myself: “Is defense the best offense?”.  The data suggests that, no, it is not.


Next lets inspect how important each of the 9 variable is for total offense and defense by checking the model loadings. Loadings are a numeric value that represents the strength and the directions of the relationship.

# plotting loadings of the outer model
plot(nba_pls, what = 'loadings', arr.width = 0.1)


Model Loadings

Three point field goal percentage and field goal percentage are the most important drivers of an effective defense. Note how offensive rebounds loading is negative and this may seem counter-intuitive at first. But, if a team has a high number of offensive rebounds, it also means that players are missing field goals (the most important drivers for offense). So, the less offensive rebounds a team has the better it is offensively. Personal faults drawn and free throw percentage are weak factors for in a good offense.

According to the model a good defensive team during the playoffs will have strong blocks and defensive rebounds numbers. A little far behind are steals, not as important as the first two.

Success as defined earlier is a set variables rather than one dependant variable. According to model loadings all four variables are highly significant. Plus/minus differential loading is equal to 0.934 and it means by how much a team win/loses games. Win percentage loading is equal to 0.939 followed by average points (0.9097) and number of wins (0.875).

Loadings greater then 0.7 are acceptable and loadings smaller than 0.7 should be removed or revised in order to show a direct positive correlation with the success measures.

A success index can be obtained by running the code below which will also produce an offense index and defense index.

Just before the last game of the finals, the Cleveland Cavalier had the highest success index followed by the Golden State Warriors. This is mostly because the “easier” path the Cavaliers had during the playoffs in contrast to the harder games played by Golden State, specially against Portland and Oklahoma City.

# index of success
                          Offense       Defense      Success
Atlanta Hawks           0.4629287  1.592191e+00  0.167262133
Boston Celtics         -0.9647560  9.129464e-01 -0.475286804
Charlotte Hornets      -0.7883822 -1.298748e+00 -0.551335436
Cleveland Cavaliers     1.1908128  9.089896e-03  1.624594362
Dallas Mavericks       -0.2207792 -2.229073e+00 -1.083904601
Detroit Pistons         0.8058766 -1.198733e+00 -0.967788741
Golden State Warriors   1.3680756  5.504884e-01  1.494728477
Houston Rockets        -1.6518981  2.925322e-01 -1.219776426
Indiana Pacers          0.7164425  4.080424e-05 -0.036057595
Los Angeles Clippers   -0.5791455  8.103359e-01 -0.003777228
Memphis Grizzlies      -1.7787707 -9.772018e-01 -2.063751504
Miami Heat              0.1292790  4.524869e-01  0.415106863
Oklahoma City Thunder   0.3947127  7.548823e-01  1.209605335
Portland Trail Blazers  0.3355825 -5.579746e-02  0.437097616
San Antonio Spurs       1.5250650  1.110722e+00  0.931435143
Toronto Raptors        -0.9450437 -7.261632e-01  0.121848406

The code can be found in my GitHub repo. This book by Gaston Sanchez served as useful resources for this post. He used data from the Spanish football league and brings a complete explanation about PLS Path Modelling.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s