Last updated: 2018-09-05

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(20180820)

    The command set.seed(20180820) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: 513b4fb

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    analysis/.DS_Store
        Ignored:    analysis/assets/
        Ignored:    data-raw/
        Ignored:    data/csv/
        Ignored:    data/raster/
        Ignored:    data/sf/
        Ignored:    docs/.DS_Store
    
    Untracked files:
        Untracked:  .Rbuildignore
        Untracked:  analysis/mapping.Rmd
    
    Unstaged changes:
        Modified:   .gitignore
        Modified:   analysis/_site.yml
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    html 42bd442 annakrystalli 2018-09-04 Build site.
    Rmd 8f5c11c annakrystalli 2018-09-04 workflowr::wflow_publish(“analysis/raster.Rmd”)
    html f94df5c annakrystalli 2018-09-04 Build site.
    Rmd ae1fd63 annakrystalli 2018-09-04 workflowr::wflow_publish(c(“analysis/raster.Rmd”))


setup

Open Notebook

Open a new R Notebook to work in.

File > New File > R Notebook

Name (eg. Rasters) and save it

Load libraries

Load the libraries we’ll be using for this section of the workshop

library(raster)  
library(rasterVis)
library(sf)
library(dplyr)
library(ggplot2)

Elements of raster data

Gridded data. Each grid cell represented by pixels in the raster. Pixels represent an area of space on the Earth’s surface

3 core metadata elements: - Coordinate Reference System (CRS) - extent - resolution

See “Raster resolution and extent”

Resolution

The spatial resolution of a raster refers the size of each cell in meters. This size in turn relates to the area on the ground that the pixel represents.

The higher the resolution for the same extent the crisper the image (and the larger the file size)

extent

\(x_{min} + (resolution_{x} \times n_{pixels}_{x})\)

Unlike vector data, the raster data model stores the coordinate of the grid cell only indirectly: There is a less clear distinction between attribute and spatial information in raster data. Say, we are in the 3rd row and the 4th column of a raster matrix. To derive the corresponding coordinate, we have to move from the origin three cells in x-direction and four cells in y-direction with the cell resolution defining the distance for each x- and y-step.

Working with rasters

Create rasters

Rasters can be thought of as matrices appended with additional environmental metadata.

myRaster1 <- raster(nrow=4, ncol=4)
myRaster1
class       : RasterLayer 
dimensions  : 4, 4, 16  (nrow, ncol, ncell)
resolution  : 90, 45  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 

Let’s have a look at it. Note that when creating a raster, if not specified the CRS falls back to the defaults of:

  • CRS: +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
  • extent: -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
  • resolution: 90, 45 (x, y)

Q: What’s been defined?

Let’s give it some values

myRaster1[] <-1:16
plot(myRaster1)

Expand here to see past versions of unnamed-chunk-3-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

Reading raster files

Data

WorldClim data

  • A great resource of global environmental data in raster format.
  • Used extensively in species distribution modelling.
  • Version 2.0 available but not yet licensed under Creative Commons license (needed to redistribute this data for the workshop).
  • Was used in the Velo-Antón et al 2013 montane salamander paper!

Bioclimatic variables

Bioclimatic variables are derived from the monthly temperature and rainfall values in order to generate more biologically meaningful variables. The bioclimatic variables represent annual trends, seasonality, and extreme or limiting environmental factors

  • BIO1 = Annual Mean Temperature
  • BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
  • BIO3 = Isothermality (BIO2/BIO7) (* 100)
  • BIO4 = Temperature Seasonality (standard deviation *100)
  • BIO5 = Max Temperature of Warmest Month
  • BIO6 = Min Temperature of Coldest Month
  • BIO7 = Temperature Annual Range (BIO5-BIO6)
  • BIO8 = Mean Temperature of Wettest Quarter
  • BIO9 = Mean Temperature of Driest Quarter
  • BIO10 = Mean Temperature of Warmest Quarter
  • BIO11 = Mean Temperature of Coldest Quarter
  • BIO12 = Annual Precipitation
  • BIO13 = Precipitation of Wettest Month
  • BIO14 = Precipitation of Driest Month
  • BIO15 = Precipitation Seasonality (Coefficient of Variation)
  • BIO16 = Precipitation of Wettest Quarter
  • BIO17 = Precipitation of Driest Quarter
  • BIO18 = Precipitation of Warmest Quarter
  • BIO19 = Precipitation of Coldest Quarter

I’ve selected a few of the variables used in the original paper to fit a Species Distribution Model.

The data is in the data/raster/mx-worldclim_30s folder.

wc_files <- list.files(here::here("data", "raster", "mx-worldclim_30s"),
                       full.names = T)
wc_files
[1] "/Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_15.tif"
[2] "/Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_4.tif" 
[3] "/Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_5.tif" 
[4] "/Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_6.tif" 

These files are in GeoTIFF format, a public domain metadata standard which allows georeferencing information to be embedded within a TIFF file.

Let’s start with a single raster file, mx.bio_5.tif which corresponds to bioclimatic variable 5: Max Temperature of Warmest Month.

bio5 <- raster(wc_files[3])
bio5
class       : RasterLayer 
dimensions  : 2181, 3638, 7934478  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_5.tif 
names       : mx.bio_5 
values      : 54, 427  (min, max)

This creates a RasterLayer object.

By having a look at the summary of the raster file when we simply print the object, straight away it looks like something funny is going on. It’s showing a range of values between 54 and 427. Now, Mexico can get hot…but not that hot! By checking the documentation for the WorldClim data, we can see that the data is stored as degrees C x 10. This is for storage efficiency (files are much smaller if numbers can be stored as integers) but it means we need to transform the data back to degrees C.

Luckily we can easily manipulate rasters, just like any other matrix in R.

bio5 <- bio5/10
bio5
class       : RasterLayer 
dimensions  : 2181, 3638, 7934478  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : mx.bio_5 
values      : 5.4, 42.7  (min, max)

That’s better!

Plotting rasters

The raster pkg has native plotting functions which are again, ok for a quick check of the data.

plot(bio5)

Expand here to see past versions of unnamed-chunk-7-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

Package rasterVis offers much nicer options for plotting raster data, including much better colour palletes which are pretty, better represent data, are easier to read by those with colorblindness, and print well in grey scale. by default.

levelplot(bio5)

Expand here to see past versions of unnamed-chunk-8-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

For numeric data it plots the distribution of the data along each axis in the plot margins. We can suppress that default behaviour by using argument margin=FALSE.

levelplot(bio5, margin=FALSE)

Expand here to see past versions of unnamed-chunk-9-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

Now this is great for individual layers, but if we have multiple layers to work with, it can be much more efficient to stack them into a rasterStack.

Raster Stacks

A RasterStack is a collection of RasterLayer objects with the same spatial extent and resolution. A RasterStack can be created from RasterLayer objects, or from raster files, or both.

We can read and stack raster files in one go using function raster::stack! And this is where the list of file names comes in handy.

st <- stack(wc_files) 

st
class       : RasterStack 
dimensions  : 2181, 3638, 7934478, 4  (nrow, ncol, ncell, nlayers)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
names       : mx.bio_15, mx.bio_4, mx.bio_5, mx.bio_6 
min values  :        10,      199,       54,      -85 
max values  :       140,     8136,      427,      218 

We can still extract individual layers using function raster::subset().

subset(st, "mx.bio_5")
class       : RasterLayer 
dimensions  : 2181, 3638, 7934478  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_5.tif 
names       : mx.bio_5 
values      : 54, 427  (min, max)

Because a rasterStack is effectively a list, we can also subset it as we would any other list in R

st[["mx.bio_5"]]
class       : RasterLayer 
dimensions  : 2181, 3638, 7934478  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_5.tif 
names       : mx.bio_5 
values      : 54, 427  (min, max)
st[[3]]
class       : RasterLayer 
dimensions  : 2181, 3638, 7934478  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.125, -86.80833, 14.54167, 32.71667  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/mx-worldclim_30s/mx-bio_5.tif 
names       : mx.bio_5 
values      : 54, 427  (min, max)

Note that we are back to having incorrect temperature values. We will deal with the layers that need correcting a bit later so just ignore that for now.

Both the native plot method and rasterVis::levelplot can handle rasterStacks

plot(st)

Expand here to see past versions of unnamed-chunk-14-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

levelplot(st)

Expand here to see past versions of unnamed-chunk-15-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

For a quick scan of a rasterStack, plot() is more useful because levelplot() function plot all panels on the same scale but there are ways of plotting with separate scales which we will link to later.

Landcover data

Land cover, original data resampled onto a 30 seconds grid sourced from DIVA GIS. DIVA-GIS is a free computer program for mapping and geographic data analysis (a geographic information system (GIS) which also provide free global spatial data.

lc_files <- list.files(here::here("data", "raster", "MEX_msk_cov"),
                       full.names = T)
lc <- raster(lc_files[1])
lc
class       : RasterLayer 
dimensions  : 2208, 3696, 8160768  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.4, -86.6, 14.4, 32.8  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/MEX_msk_cov/MEX_msk_cov.grd 
names       : MEX_msk_cov 
values      : 1, 22  (min, max)

Let’s plot this again to have a look at it.

levelplot(lc)

Expand here to see past versions of unnamed-chunk-18-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

This raster contains categorical data, so the scales used as well as the inclusion of distributions along the margins do not see appropriate. Such data can be better defined using the rasteVis::ratify function.

lc <- ratify(lc)
lc
class       : RasterLayer 
dimensions  : 2208, 3696, 8160768  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.4, -86.6, 14.4, 32.8  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/MEX_msk_cov/MEX_msk_cov.grd 
names       : MEX_msk_cov 
values      : 1, 22  (min, max)
attributes  :
       ID
 from:  1
 to  : 22

Now we see that rather than a values: 1, 22 (min, max) we have an attributes: field containing a table summarising the levels with from: for the first and to: for the last entry. The actual levels are stored in what is known as a “Raster Attribute Table” (RAT). This can be accessed through the levels() function.

levels(lc)
[[1]]
   ID
1   1
2   2
3   4
4   6
5   9
6  11
7  12
8  13
9  14
10 15
11 16
12 20
13 22

Let’s try and plot again.

levelplot(lc)
Error in `[.data.frame`(rat, , att): undefined columns selected

This time, plotting fails. This is because there are no descriptions associated with the levels.

We can define this defined with more informative descriptions. As I forgot to save them as .csv as part of the workshop materials, here is a snippet of code that can be copied and pasted to create a data.frame of factor levels and their associated descriptions.

(or go to http://bit.ly/lc_levels, click on raw and copy the code snippet from there)

lc_levels <- structure(list(level = 1:22, descr = c("Tree Cover, broadleaved, evergreen", 
"Tree Cover, broadleaved, deciduous, closed", "Tree Cover, broadleaved, deciduous, open", 
"Tree Cover, needle-leaved, evergreen", "Tree Cover, needle-leaved, deciduous", 
"Tree Cover, mixed leaf type", "Tree Cover, regularly flooded, fresh  water", 
"Tree Cover, regularly flooded, saline water", "Mosaic: Tree cover / Other natural vegetation", 
"Tree Cover, burnt", "Shrub Cover, closed-open, evergreen", "Shrub Cover, closed-open, deciduous", 
"Herbaceous Cover, closed-open", "Sparse Herbaceous or sparse Shrub Cover", 
"Regularly flooded Shrub and/or Herbaceous Cover", "Cultivated and managed areas", 
"Mosaic: Cropland / Tree Cover / Other natural vegetation", "Mosaic: Cropland / Shrub or Grass Cover", 
"Bare Areas", "Water Bodies", "Snow and Ice", "Artificial surfaces and associated areas"
)), class = "data.frame", .Names = c("level", "descr"), row.names = c(NA, 
-22L))
rat <- levels(lc)[[1]]
rat <- rat %>% left_join(lc_levels, by = c("ID" = "level"))
levels(lc) <- rat

Let’s have a look at our land cover raster

lc
class       : RasterLayer 
dimensions  : 2208, 3696, 8160768  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -117.4, -86.6, 14.4, 32.8  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 
data source : /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/raster/MEX_msk_cov/MEX_msk_cov.grd 
names       : MEX_msk_cov 
values      : 1, 22  (min, max)
attributes  :
       ID                                    descr
 from:  1       Tree Cover, broadleaved, evergreen
 to  : 22 Artificial surfaces and associated areas

Let’s plot again

levelplot(lc)

Expand here to see past versions of unnamed-chunk-25-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

Success!

And seeing as we’re dealing with primarily vegetation, let’s create a new map theme (colour palette) using function rasterVis::rasterTheme and colour brewer palette Yellow & Greens (“YlGn”).

mapTheme <- rasterTheme(region = rev(brewer.pal(9,"YlGn")))
levelplot(lc, par.settings = mapTheme)

Expand here to see past versions of unnamed-chunk-26-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

The function takes a vector of colours to produce a colour gradient that is then mapped to raster values. It has a numer of in-built colour vectors to choose from and you can even provide your own custom vectors of functions (which is what we would probably want to do in our case to make the colour more reflective of the vegetation type).

stacking rasterLayers

stack(st, lc)
Error in compareRaster(x): different extent

This doesn’t work, notifying us that there is a problem with mismatching extents. We don’t really need the whole extent of data anyways so let’s try croping everything to the same extent, that of the study area bounding box we defined in our vector workflow

So let’s load the molecular data that we have converted to an sf using function sf::read_sf and recreate a bounding box.

mol_sf <- read_sf(here::here("data", "sf", "salamander.geojson"))
study_bbox <- mol_sf %>% st_bbox() %>% st_as_sfc()

This bounding box is really tight around our data points. To ensure our raster data contain the locations of all our data points, let’s give this extraction bounding box some space around our points using function sf::st_buffer.

Looking at the help file for this function using ?st_buffer gives us information on a whole suite of useful functions to perform geometric operations on simple feature geometry sets.

extract_bbox <- study_bbox %>% st_buffer(dist = 1)
Warning in st_buffer.sfc(., dist = 1): st_buffer does not correctly buffer
longitude/latitude data
extract_bbox
Geometry set for 1 feature 
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -100.8481 ymin: 17.94194 xmax: -96.09056 ymax: 20.63083
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs

To crop a raster we use function raster::crop which will returns a geographic subset of the raster as specified either by an Extent object or an object from which an extent object can be extracted/created.

In our case, we’ll use the extract_bbox sf we just created. So let’s try and crop lc first.

crop(lc, extract_bbox)
Error in .local(x, y, ...): Cannot get an Extent object from argument y

Ooops! That throws an error! That’s because of current sf and raster compatibility issues. All we need to do though is convert our sf to an sp spatial class object which raster is designed to work with. We can do this with function sf::as_Spatial.

sp_extract_bbox <- as_Spatial(extract_bbox)
sp_extract_bbox
class       : SpatialPolygons 
features    : 1 
extent      : -100.8481, -96.09056, 17.94194, 20.63083  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 

Let’s try now.

crop(lc, sp_extract_bbox)
class       : RasterLayer 
dimensions  : 323, 571, 184433  (nrow, ncol, ncell)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -100.85, -96.09167, 17.94167, 20.63333  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 
data source : in memory
names       : MEX_msk_cov 
values      : 1, 22  (min, max)
attributes  :
       ID                                    descr
 from:  1       Tree Cover, broadleaved, evergreen
 to  : 22 Artificial surfaces and associated areas

Success! This works.

So let’s stack and crop all in one go:

full_stack <- stack(
crop(lc, sp_extract_bbox),
crop(st, sp_extract_bbox)
)

full_stack
class       : RasterStack 
dimensions  : 323, 571, 184433, 5  (nrow, ncol, ncell, nlayers)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -100.85, -96.09167, 17.94167, 20.63333  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 
names       : MEX_msk_cov, mx.bio_15, mx.bio_4, mx.bio_5, mx.bio_6 
min values  :           1,        41,      655,       54,      -83 
max values  :          22,       117,     3387,      411,      180 

Awesome! We’ve now got all our initial raster files in a single rasterStack 🎉. We’re not done though. There are two things we need to address for our final rasterStack.

  1. We want the temperature range (ie the difference between mx.bio_5 and mx.bio_6) for our SDM rather than the extremes.
  2. We also still need to address the fact that our temperature data is currrently in degrees C x 10.

So let’s try and address this by creating a new rasterStack from the layers in our full_stack. We use function raster::stack() again.

env_stack <- stack(
    (full_stack[["mx.bio_5"]] - full_stack[["mx.bio_6"]])/10,
    full_stack[["mx.bio_4"]], 
    full_stack[["mx.bio_15"]],
    full_stack[["MEX_msk_cov"]])

env_stack
class       : RasterStack 
dimensions  : 323, 571, 184433, 4  (nrow, ncol, ncell, nlayers)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -100.85, -96.09167, 17.94167, 20.63333  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
names       :  layer, mx.bio_4, mx.bio_15, MEX_msk_cov 
min values  :   13.7,    655.0,      41.0,         1.0 
max values  :   27.2,   3387.0,     117.0,        22.0 

Let’s give our layers better names. This is easily achieved with function names()

names(env_stack) <- c("temp_range","temp_seasonality", 
                     "prec_seasonality", "land_cover")
env_stack
class       : RasterStack 
dimensions  : 323, 571, 184433, 4  (nrow, ncol, ncell, nlayers)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -100.85, -96.09167, 17.94167, 20.63333  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
names       : temp_range, temp_seasonality, prec_seasonality, land_cover 
min values  :       13.7,            655.0,             41.0,        1.0 
max values  :       27.2,           3387.0,            117.0,       22.0 

Exercise

1) Create and plot a new rasterLayer of rough mean temperature.

(rough because it would be much better to use more data at higher temporal resolution, eg at least monthly, not extremes).

plotting our rasterStack

Let’s try and plot our new environmental stack.

levelplot(env_stack)
Error in .checkLevels(levs[[j]], value[[j]]): new raster attributes (factor values) should be in a data.frame (inside a list)

This doesn’t work now because we are trying to mix displaying factor and numeric data.

We can still extract and plot individual layers though.

levelplot(env_stack, layers = "temp_seasonality")

Expand here to see past versions of unnamed-chunk-37-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

For more details on how to plot several rasterLayers with different legends (including different data types) in the rasterVis package FAQs. It would also solve the problem we had earlier with plotting multiple layers using the same scale.

Save rasterStack

Finally, let’s save our raster stack as a binary ‘Native’ raster package .grd file format using function raster::writeRaster(). We’ll do that to preserve the layer names in the rasterStack. It also allows us to combine categorical and numeric layers in one file.

writeRaster(env_stack, filename = here::here("data", "raster", "env_stack.grd"))

However, these files are not compressed. If the size of the files is an issue, we can save each file as an individual GeoTiff file and reimport them all together into a stack later on.

dir.create(here::here("data", "raster", "processed"))

writeRaster(env_stack, 
            filename=here::here("data", "raster" , 
                                "processed", "env_stack.tif"),
            bylayer = T, suffix = names(env_stack))

The following code lists all the files in the processed folder, matching only those files that end with .tiff (ignoring the env_land_cover.tif.aux.xml file which contains the RAT and would throw an error), reads and stacks them!

stack(list.files(here::here("data", "raster" , "processed"),
                 pattern = ".tif$",
                 full.names = T))
class       : RasterStack 
dimensions  : 323, 571, 184433, 4  (nrow, ncol, ncell, nlayers)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -100.85, -96.09167, 17.94167, 20.63333  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
names       : env_stack_land_cover, env_stack_prec_seasonality, env_stack_temp_range, env_stack_temp_seasonality 
min values  :                  1.0,                       41.0,                 13.7,                      655.0 
max values  :                 22.0,                      117.0,                 27.2,                     3387.0 

extract environmental data

extracting points

We can extract data underlying an sf from a raster using function raster::extract(). The output in the case of points is a single value for each point. This is returned as a vector for a single layer or a matrix for multiple layers, as is our case.

Let’s have a look at our data again

mol_sf
Simple feature collection with 15 features and 9 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: -99.84806 ymin: 18.94194 xmax: -97.09056 ymax: 19.63083
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
# A tibble: 15 x 10
      id locality       n mountain_chain   region    na    he    ar    par
   <int> <chr>      <int> <chr>            <chr>  <dbl> <dbl> <dbl>  <dbl>
 1     1 Nevado de…    12 Nevado de Toluca Centr…  5.44 0.620  4.56 0.350 
 2     2 Texcalyac…    29 Sierra de las C… Centr…  8.22 0.660  5.14 0.500 
 3     3 Desierto …     7 Sierra de las C… Centr…  4.44 0.590  4.44 0.180 
 4     4 Ajusco         8 Sierra de las C… Centr…  4.22 0.490  4.05 0.0200
 5     8 Calpan        34 Sierra Nevada    Centr… 11.9  0.730  6.48 0.290 
 6     9 Atzompa       43 Sierra Nevada    Centr… 10.3  0.690  5.79 0.0800
 7    10 Llano Gra…    15 Sierra Nevada (… Centr…  7.78 0.650  5.80 0.250 
 8    11 Rio Frio      27 Sierra Nevada (… Centr…  7.56 0.570  4.77 0.130 
 9    12 Nanacamil…    14 Sierra Nevada (… Centr…  6.22 0.590  4.91 0.100 
10    13 MalincheS      8 Malinche         Centr…  5.00 0.580  4.69 0.0900
11    14 MalincheW     17 Malinche         Centr…  6.67 0.600  4.76 0.0600
12    16 MalincheE     13 Malinche         Centr…  6.11 0.560  4.73 0.210 
13    17 Texmalaqu…     8 Pico de Orizaba  South…  6.00 0.710  5.64 0.910 
14    18 Xometla       16 Pico de Orizaba  South…  9.11 0.830  6.86 0.490 
15    19 Vigas         48 Cofre de Perote  North… 11.8  0.660  5.75 1.31  
# ... with 1 more variable: geometry <POINT [°]>

Because we want to combine it with our previous data in mol_sf we’ll pipe the resulting matrix into as.data.frame so we can easily bind our extracted environmental data to our molecular data.

env_points <- extract(env_stack, as_Spatial(mol_sf)) %>% as.data.frame()

mol_env_sf <- bind_cols(mol_sf, env_points)
mol_env_sf
Simple feature collection with 15 features and 13 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: -99.84806 ymin: 18.94194 xmax: -97.09056 ymax: 19.63083
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
# A tibble: 15 x 14
      id locality       n mountain_chain   region    na    he    ar    par
   <int> <chr>      <int> <chr>            <chr>  <dbl> <dbl> <dbl>  <dbl>
 1     1 Nevado de…    12 Nevado de Toluca Centr…  5.44 0.620  4.56 0.350 
 2     2 Texcalyac…    29 Sierra de las C… Centr…  8.22 0.660  5.14 0.500 
 3     3 Desierto …     7 Sierra de las C… Centr…  4.44 0.590  4.44 0.180 
 4     4 Ajusco         8 Sierra de las C… Centr…  4.22 0.490  4.05 0.0200
 5     8 Calpan        34 Sierra Nevada    Centr… 11.9  0.730  6.48 0.290 
 6     9 Atzompa       43 Sierra Nevada    Centr… 10.3  0.690  5.79 0.0800
 7    10 Llano Gra…    15 Sierra Nevada (… Centr…  7.78 0.650  5.80 0.250 
 8    11 Rio Frio      27 Sierra Nevada (… Centr…  7.56 0.570  4.77 0.130 
 9    12 Nanacamil…    14 Sierra Nevada (… Centr…  6.22 0.590  4.91 0.100 
10    13 MalincheS      8 Malinche         Centr…  5.00 0.580  4.69 0.0900
11    14 MalincheW     17 Malinche         Centr…  6.67 0.600  4.76 0.0600
12    16 MalincheE     13 Malinche         Centr…  6.11 0.560  4.73 0.210 
13    17 Texmalaqu…     8 Pico de Orizaba  South…  6.00 0.710  5.64 0.910 
14    18 Xometla       16 Pico de Orizaba  South…  9.11 0.830  6.86 0.490 
15    19 Vigas         48 Cofre de Perote  North… 11.8  0.660  5.75 1.31  
# ... with 5 more variables: geometry <POINT [°]>, temp_range <dbl>,
#   temp_seasonality <dbl>, prec_seasonality <dbl>, land_cover <dbl>

Our new sf is now ready to use for species distribution modelling. But we can also start visualising the relationships between our molecular and environmental variables

ggplot(mol_env_sf, aes(x = temp_range, y = na, colour = region)) +
    geom_point()

Expand here to see past versions of unnamed-chunk-43-1.png:
Version Author Date
f94df5c annakrystalli 2018-09-04

extracting and summarising raster data using polygons

We can also extract and summarise data over an area represented by a polygon using using the raster::extract() function. If we want a summarising function to be applied to the pixel values returned by the extraction, we can supply it to argument fun. Let’s calculate the mean temp_range across the study_bbox area.

mean_temp_range <- extract(env_stack[["temp_range"]], 
                as_Spatial(study_bbox),
                fun = mean,
                na.rm = T)

mean_temp_range
         [,1]
[1,] 22.21228

Let’s use this to calculate the deviation of each of our data points from the study box mean we just calculated. We can use dplyr::mutate to manipulate attribute data in our sf just like any other data.frame.

mol_env_sf <- mol_env_sf %>% 
    mutate(dev_temp_range = temp_range - as.vector(mean_temp_range))
mol_env_sf %>% select(locality, dev_temp_range)
Simple feature collection with 15 features and 2 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: -99.84806 ymin: 18.94194 xmax: -97.09056 ymax: 19.63083
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
# A tibble: 15 x 3
   locality               dev_temp_range             geometry
   <chr>                           <dbl>          <POINT [°]>
 1 Nevado de Toluca               -3.01  (-99.84806 19.19361)
 2 Texcalyacac                     0.188     (-99.5 19.12056)
 3 Desierto de los Leones         -2.11  (-99.30056 19.26667)
 4 Ajusco                         -3.11      (-99.3 19.18278)
 5 Calpan                         -2.81  (-98.59167 19.13139)
 6 Atzompa                        -1.11  (-98.55972 19.18056)
 7 Llano Grande                   -1.41  (-98.72056 19.33889)
 8 Rio Frio                       -2.01  (-98.69472 19.36611)
 9 Nanacamilpa                     0.288 (-98.59611 19.48028)
10 MalincheS                      -1.71  (-98.02194 19.18722)
11 MalincheW                      -0.112   (-98.095 19.25778)
12 MalincheE                      -0.912      (-97.975 19.23)
13 Texmalaquilla                  -2.21     (-97.29 18.94194)
14 Xometla                        -1.31    (-97.19056 18.975)
15 Vigas                          -3.51  (-97.09056 19.63083)

Let’s save our final sf now containing the environmental data we extracted

write_sf(mol_env_sf, here::here("data", "sf", "env_salamander.geojson"))
Dataset /Users/Anna/Documents/workflows/workshops/intro-r-gis/data/sf/env_salamander.geojson already exists: remove first, use update=TRUE to append,
delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), : Dataset already exists.

Exercise

2) Calculate mean precipitation seasonality for the extraction bounding box area. What is the value?

3) Add a column indicating whether data points are greater than (TRUE) or less than (FALSE) extraction area mean precipitation seasonality.



Other useful raster functions you should know about:

http://rspatial.org/spatial/rst/8-rastermanip.html

  • merge: merge rasterLayers
  • trim: remove outer NA rows and columns
  • extend: expand margins with NA.
  • projectRaster: Project the values of a Raster* object to a new Raster* object with another projection (coordinate reference system, (CRS)).

Session information

sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2      ggplot2_3.0.0       dplyr_0.7.6        
 [4] sf_0.6-3            rasterVis_0.45      latticeExtra_0.6-28
 [7] RColorBrewer_1.1-2  lattice_0.20-35     raster_2.6-7       
[10] sp_1.2-5           

loaded via a namespace (and not attached):
 [1] zoo_1.8-3         tidyselect_0.2.4  purrr_0.2.5      
 [4] colorspace_1.3-2  htmltools_0.3.6   viridisLite_0.3.0
 [7] emo_0.0.0.9000    yaml_2.1.19       utf8_1.1.3       
[10] rlang_0.2.1       R.oo_1.21.0       e1071_1.6-8      
[13] hexbin_1.27.1     pillar_1.2.1      glue_1.2.0.9000  
[16] withr_2.1.2       DBI_1.0.0         R.utils_2.6.0    
[19] bindr_0.1.1       plyr_1.8.4        stringr_1.3.1    
[22] munsell_0.5.0     gtable_0.2.0      workflowr_1.0.1  
[25] R.methodsS3_1.7.1 evaluate_0.11     labeling_0.3     
[28] knitr_1.20        parallel_3.4.4    class_7.3-14     
[31] Rcpp_0.12.18      backports_1.1.2   scales_1.0.0     
[34] classInt_0.1-24   digest_0.6.15     stringi_1.2.4    
[37] grid_3.4.4        rprojroot_1.3-2   cli_1.0.0        
[40] here_0.1          rgdal_1.3-4       tools_3.4.4      
[43] magrittr_1.5      lazyeval_0.2.1    tibble_1.4.2     
[46] crayon_1.3.4      whisker_0.3-2     pkgconfig_2.0.2  
[49] lubridate_1.7.4   rstudioapi_0.7    assertthat_0.2.0 
[52] rmarkdown_1.10    R6_2.2.2          units_0.6-0      
[55] git2r_0.21.0      compiler_3.4.4   

This reproducible R Markdown analysis was created with workflowr 1.0.1