Previous: 1. Canadian programming
Next: 3. Process data

2. Scrape raw data

To scrape the data from nhl.com, I am using the nhlscrapr package. It has a single command compile.all.games(), which downloads and compiles everything together.

However, it waits 20 seconds between every game, and therefore takes more than 3.5 days to run. Instead, one might want to use something like the two options below to download games one-by-one or by season, and to set a shorter time interval.

suppressMessages({
  library(nhlscrapr)
})

compile.all.games()
## Loading game and player data.
## 20022003: no games need updating.
## 20032004: no games need updating.
## 20052006: no games need updating.
## 20062007: no games need updating.
## 20072008: no games need updating.
## 20082009: no games need updating.
## 20092010: no games need updating.
## 20102011: no games need updating.
## 20112012: no games need updating.
## 20122013: no games need updating.
## 20132014: no games need updating.
## Downloading files for game 20142015 30114
## Pausing: 20
## Downloading files for game 20142015 30115
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30116
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30117
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30123
## Pausing: 20
## Downloading files for game 20142015 30124
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30125
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30126
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30127
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30134
## Pausing: 20
## Downloading files for game 20142015 30135
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30136
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30137
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30144
## Pausing: 20
## Downloading files for game 20142015 30145
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30146
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30147
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30154
## Pausing: 20
## Downloading files for game 20142015 30155
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30156
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30157
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30164
## Pausing: 20
## Downloading files for game 20142015 30165
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30166
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30167
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30174
## Pausing: 20
## Downloading files for game 20142015 30175
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30176
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30177
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30184
## Pausing: 20
## Downloading files for game 20142015 30185
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30186
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30187
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30211
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30212
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30213
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30214
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30215
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30216
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30217
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30221
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30222
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30223
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30224
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30225
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30226
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30227
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30231
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30232
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30233
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## 20 consecutive failed attempts; stopping file retrieval.
## 20142015 -- updating rosters on each game file.
## Folding data frames. Total: 15
## Folding data frames. Total: 6
## Folding data frames. Total: 3
## Folding data frames. Total: 1
## Adding event location sections.
## Saving to output file
## [1] TRUE

To download games one-by-one or by season.

# get full list of games available
games <- full.game.database()

# download by game
apply(games, 1, function(game) {
  download.single.game(season=game["season"], gcode=game["gcode"], wait=2)
  gc()
})

# download by season
lapply(unique(games$season), function(season) {
  download.games(games[games$season == season, ], wait=2)
  gc()
})

# and once downloaded, compile everything together
compile.all.games()

Next: 3. Process data
Previous: 1. Canadian programming