Causal Impact of a TV documentary on the demand of KeepCup

A few months ago ABC aired a documentary about how much waste we produce called War on Waste. The episode revealed that Australians use about 28 disposable takeaway coffee cups per second, triggering a lot of viewers to consider alternatives such as reusable cups.

The story did not mention any specific brand of reusable cup, but it did cause a significant impact on Google searches for the most popular brand: KeepCup. Looking at historic data from Google Trends it is clear that after the episode, searches for reusable coffee cups skyrocketed, as well as specific searches for KeepCup.

Continue reading

A Single-Index Model Shiny App for ETFs

In the quantitative finance world the Single-Index Model (SIM) is commonly used to price assets by measuring both volatility and return of a stock. According to this model, the return of any stock can be decomposed into expected excess returns (the returns above those from a market index, for example ASX200) due to firm-specific factors and macroeconomic factors.

Luckily R has many useful applications for quantitative finance in libraries stockPortfolio and PerformanceAnalytics which I have been using for the last 2 years to balance a portfolio of Exchange-Traded Funds (ETFs).

EDIT: I was forced to re-write the app using tidyquant to get returns as the function stockPortfolio::getReturns stopped working with Yahoo Finance data.

Continue reading

Success index of NBA teams using PLS Path Modelling

There are a few ways to access how good a team is. In this post I use Partial Least Squares Path Modelling to construct a team success index for NBA teams during the 2015-16 Playoffs.

PLS-PM can be described as a tool to analyse the relationship between blocks of variables, taking into account some previous knowledge about the phenomenon observed. In most competitive team sports like basketball success depends on a large number of variables. However, most of these variables can be grouped under two major blocks: offense and defense.

Continue reading

Visualising the 2015 NBA Draft in R

This visualisation in R displays the origins and destinations of players participating in the 2015 NBA Draft using Sankey diagrams.

A Sankey diagram is a visualisation used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains (e.g., organisation type and organisation name) or multiple paths through a set of stages (colleges and NBA teams).

Continue reading

What AFL stadiums pull the biggest crowds?

I found a fantastic dataset on Australian Rules Football (or Australian Football League – AFL). The AFL Machine Learning Competition, promoted by Sportsbet, provided statistics on every match played since the year 2000. A good piece of information in this dataset is the registered attendance of each match.

So I decided to plot the attendance using ggplot2 boxplots.

Continue reading

Visualising NBA shot charts in Tableau

In my last post I produced some NBA shot charts in R using data scraped from and ggplot2. This time I extracted all shot location data available for 490 players and linked it to a Tableau dashboard.

The first dashboard shows each shot attempted during the 2014-15 NBA Regular Season. On the right, it is possible to select team, player, shot type, shot zone and shot range. The table above the chart is also updated in line with the filter selection (click on the image to open the dashboard on a new window). Continue reading

How to create NBA shot charts in R

A while ago I found this fantastic post about NBA shot charts built in Python. Since my Python skills are quite basic I decided to reproduce such charts in R using data scraped from the internet and ggplot2.

Getting the Data

First we need the shot data from This blog post from Greg Reda does a great job explaining how to find the underlying API and extract data from a web app (in this case,

To get shot data for Stephen Curry we will use this url. The url shows the shots taken by Curry during the 2014-15 regular season in a JSON structure. Note also that Season, SeasonType and PlayerID are parameters in the url. Stephen Curry’s PlayerID is 201939.

Continue reading

How much does a yard cost in the NFL?

NFL teams spend a lot of money on their key players, specially running backs whose objective is to conquer the most amount of rushing yards as possible. But how to measure the return on investment in running backs? Are teams getting the return expected? How much does a rushing yard cost in the NFL?

I plotted rushing yards for each running back during the regular NFL season of 2014 and their respective average yearly salary (the total contracted amount divided by contract duration). The result shows an expected positive correlation: the more money spent on running back salaries the more rushing yards a team is expected to gain. Running backs perceived as “good” tend to get more money, right? However this is not true for every team when we look at the big picture. Continue reading

Principal Component Analysis and surfers performance

I had access to surfing data from, a site that collects data for fantasy surfing.  It contains average scores for 55 surfers in the 2014 World Surfing League under different conditions: wind, wave size, surf break, weather and more. Scores are continuous, varying from 0 to 20.

This is how the WSL scoring works in more detail: Continue reading