I found a fantastic dataset on Australian Rules Football (or Australian Football League – AFL). The AFL Machine Learning Competition, promoted by Sportsbet, provided statistics on every match played since the year 2000. A good piece of information in this dataset is the registered attendance of each match.
So I decided to plot the attendance using ggplot2 boxplots.
The first plot shows attendance by venue. The thick black line in the middle of each boxplot indicates the median attendance value. The “start’ and “finish” of each box represent the 1st and 3rd quartile of the attendance distribution respectively. The lines extending horizontally represent the variability outside the upper and lower quartiles. Each black dot is a match day when the number of people attending the match was abnormal, or, an outlier. There are two boxes for each venue and season: one for regular season matches and one for final matches (quarter finals, semi finals, preliminary finals and the grand final).
The Melbourne Cricket Ground (M.C.G.) has the largest attendance numbers in the last 15 seasons. No surprise there since the M.C.G. is the 10th largest stadium in the world with capacity for 100,024 people and it is home for the AFL grand final, which since 2000 attracted an average of 95,000 people. Curiously, there are no outliers match days for the G, as the attendance distribution is quite spread out.
The next plot shows attendance by season. The seasons with the highest attendance were 2008 and 2009 for both regular and finals matches. There were no outliers for finals matches.
The next plot shows attendance distribution by round. During the regular season attendance is fairly constant, around 35,000 people per game, with a few large outliers, usually local derbies. Grand final had the highest mean attendance, followed by preliminary finals and quarter finals.
You can run the code below and get the same result.
# load data afl <- read.csv("https://www.dropbox.com/s/umxpkmo1lmc38eg/afl.csv?dl=1", header = TRUE) # boxplots library(ggplot2) qplot(data = afl, venue, att, geom = "boxplot", fill = round_type) + labs(x = "Venue", y = "Attendance") + coord_flip() + guides(fill = guide_legend(keywidth = 3, keyheight = 2, title = "Round Type")) + ggtitle("AFL Venues Attendance Boxplot") # Attendance box plot of seasons afl$season <- as.factor(afl$season) qplot(data = afl, season, att, geom = "boxplot", fill = round_type) + labs(x = "Season", y = "Attendance") + coord_flip() + guides(fill = guide_legend(keywidth = 3, keyheight = 2, title = "Round Type")) + ggtitle("AFL Seasons Attendance Boxplot") # Attendance box plot of seasons qplot(data = afl, round, att, geom = "boxplot", fill = round_type) + labs(x = "Round", y = "Attendance") + coord_flip() + guides(fill = guide_legend(keywidth = 3, keyheight = 2, title = "Round Type")) + ggtitle("AFL Rounds Attendance Boxplot")
The link in your code to the AFL data is broken I think? Are you able to update it? I’d like to have a look at the dataset and can’t find it anywhere else online.
LikeLike
Hi there,
I have updated the link in the code.
Thanks for letting me know.
LikeLike