In this mini-workshop we will introduce the sf package, show some examples of geospatial analysis, work with base plotting of sf objects, and show how mapview can be used to map these objects. It is assumed that you have R and RStudio installed and that you, at a minimum, understand the basic concepts of the R language (e.g. you can work through R For Cats).

Also as an aside, I am learning the sf package right now so, we will be learning all of this together!

The sf package

Things are changing quickly in the R/spatial analysis world and the most fundamental change is via the sf package. This package aims to replace sp, rgdal, and rgeos. There are a lot of reasons why this is a good thing, but that is a bit beyond the scope of this workshop Suffice it to say it should make things faster and simpler!

To get started, lets get sf installed:

install.packages("sf")
library("sf")

sf does rely on having access the GDAL, GEOS, and Proj.4 libraries. On Windows and Mac this should be pretty straightforward if we are installing from CRAN (which we are). If you use linux, you know nothing is straightforward and you are on your own!

Exercise 1

The first exercise won’t be too thrilling, but we need to make sure everyone has the packages installed.

  1. Install sf.
  2. Load sf.
  3. If you don’t have dplyr already, make sure it is installed.
  4. Load dplyr.

Reading in spatial data with sf

Simple Features

So, what does sf actually provide us? It is an implementation of an ISO standard for storing spatial data. It forms the basis for many of the common vector data models and is centered on the concept of a “feature”. Essentially a feature is any object in the real world. There are many different types of features and there are different details that get stored about each. The first sf vignette does a really nice job of explaining the details. For this mini-workshop we are going to focus on three feature types, POINT, LINESTRING, and POLYGON. For each of the types, there will be coordinates stored as dimensions, a coordinate reference system, and attributes.

Get some data to use

We can grab some data directly from the Rhode Island Geographic Information System (RIGIS) for these examples. This code assumes you have a data folder in your current workspace. Create one if you need it.

# Create a data folder if it doesn't exist, and yes R can do that
if(!dir.exists("data")){dir.create("data")}

# Municipal Boundaries
download.file(url = "http://www.rigis.org/geodata/bnd/muni97d.zip",
              destfile = "data/muni97d.zip")
unzip(zipfile = "data/muni97d.zip", 
      exdir = "data")

# Streams
download.file(url = "http://www.rigis.org/geodata/hydro/streams.zip",
              destfile = "data/streams.zip")
unzip(zipfile = "data/streams.zip", 
      exdir = "data")

# Potential Growth Centers
download.file(url = "http://www.rigis.org/geodata/plan/growth06.zip",
              destfile = "data/growth06.zip")
unzip(zipfile = "data/growth06.zip", 
      exdir = "data")

# Land Use/Land Cover
download.file(url = "http://www.rigis.org/geodata/plan/rilc11d.zip",
              destfile = "data/rilc11d.zip")
unzip(zipfile = "data/rilc11d.zip", 
      exdir = "data")

Data input

To pull the shapefiles in we can simply use the st_read() function. This will create an object which is a simple feature collection of, in our case, POINT, LINESTRING, or POLYGON. As an aside, many of the sf functions and all of the ones we will be using start with st_. This stands for “spatial” and “temporal”. Take a look below for examples reading in each of our datasets.

POINT

growth_cent <- st_read("data/growth06.shp")
## Reading layer `growth06' from data source `/home/jhollist/projects/geospatial_with_sf/data/growth06.shp' using driver `ESRI Shapefile'
## Simple feature collection with 21 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 260137.3 ymin: 32916.7 xmax: 418116.3 ymax: 326549.2
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs

LINESTRING

streams <- st_read("data/streams.shp")
## Reading layer `streams' from data source `/home/jhollist/projects/geospatial_with_sf/data/streams.shp' using driver `ESRI Shapefile'
## Simple feature collection with 4470 features and 8 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 234010.1 ymin: 31361.37 xmax: 430921.9 ymax: 340865.8
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs

POLYGON

muni <- st_read("data/muni97d.shp")
## Reading layer `muni97d' from data source `/home/jhollist/projects/geospatial_with_sf/data/muni97d.shp' using driver `ESRI Shapefile'
## Simple feature collection with 396 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 220310.4 ymin: 23048.49 xmax: 432040.9 ymax: 340916.6
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
lulc <- st_read("data/rilc11d.shp")
## Reading layer `rilc11d' from data source `/home/jhollist/projects/geospatial_with_sf/data/rilc11d.shp' using driver `ESRI Shapefile'
## Simple feature collection with 68186 features and 5 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 218380.4 ymin: 23048.5 xmax: 434592 ymax: 343508
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs

Other data

We won’t have time during this mini-workshop to look at reading in other data formats, but since sf uses GDAL it can read all of the files for which you have drivers.

drvrs <- st_drivers()
drvrs$long_name
##  [1] PCIDSK Database File                                
##  [2] Network Common Data Format                          
##  [3] JPEG-2000 driver based on OpenJPEG library          
##  [4] Geospatial PDF                                      
##  [5] ESRI Shapefile                                      
##  [6] MapInfo File                                        
##  [7] UK .NTF                                             
##  [8] SDTS                                                
##  [9] IHO S-57 (ENC)                                      
## [10] Microstation DGN                                    
## [11] VRT - Virtual Datasource                            
## [12] EPIInfo .REC                                        
## [13] Memory                                              
## [14] Atlas BNA                                           
## [15] Comma Separated Value (.csv)                        
## [16] NAS - ALKIS                                         
## [17] Geography Markup Language (GML)                     
## [18] GPX                                                 
## [19] Keyhole Markup Language (LIBKML)                    
## [20] Keyhole Markup Language (KML)                       
## [21] GeoJSON                                             
## [22] Interlis 1                                          
## [23] Interlis 2                                          
## [24] GMT ASCII Vectors (.gmt)                            
## [25] GeoPackage                                          
## [26] SQLite / Spatialite                                 
## [27] OGR_DODS                                            
## [28] ODBC                                                
## [29] WAsP .map format                                    
## [30] ESRI Personal GeoDatabase                           
## [31] Microsoft SQL Server Spatial Database               
## [32] OGDI Vectors (VPF, VMAP, DCW)                       
## [33] PostgreSQL/PostGIS                                  
## [34] MySQL                                               
## [35] ESRI FileGDB                                        
## [36] X-Plane/Flightgear aeronautical data                
## [37] AutoCAD DXF                                         
## [38] Geoconcept                                          
## [39] GeoRSS                                              
## [40] GPSTrackMaker                                       
## [41] Czech Cadastral Exchange Data Format                
## [42] PostgreSQL SQL dump                                 
## [43] OpenStreetMap XML and PBF                           
## [44] GPSBabel                                            
## [45] Tim Newport-Peace's Special Use Airspace Format     
## [46] OpenAir                                             
## [47] Planetary Data Systems TABLE                        
## [48] OGC WFS (Web Feature Service)                       
## [49] Hydrographic Transfer Vector                        
## [50] Aeronav FAA                                         
## [51] Geomedia .mdb                                       
## [52] French EDIGEO exchange format                       
## [53] Google Fusion Tables                                
## [54] Scalable Vector Graphics                            
## [55] CouchDB / GeoCouch                                  
## [56] Cloudant / CouchDB                                  
## [57] Idrisi Vector (.vct)                                
## [58] Arc/Info Generate                                   
## [59] SEG-P1 / UKOOA P1/90                                
## [60] SEG-Y                                               
## [61] MS Excel format                                     
## [62] Open Document/ LibreOffice / OpenOffice Spreadsheet 
## [63] MS Office Open XML spreadsheet                      
## [64] Elastic Search                                      
## [65] Walk                                                
## [66] Carto                                               
## [67] AmigoCloud                                          
## [68] Storage and eXchange Format                         
## [69] Selafin                                             
## [70] OpenJUMP JML                                        
## [71] Planet Labs Scenes API                              
## [72] OGC CSW (Catalog  Service for the Web)              
## [73] VDV-451/VDV-452/INTREST Data Format                 
## [74] U.S. Census TIGER/Line                              
## [75] Arc/Info Binary Coverage                            
## [76] Arc/Info E00 (ASCII) Coverage                       
## [77] HTTP Fetching Wrapper                               
## 201 Levels: ACE2 Aeronav FAA AirSAR Polarimetric Image ... ZMap Plus Grid

Additionally, you if you have a tabular dataset with coordinates, you can create an sf object with those. An example using EPA’s National Lakes Assessment data:

nla_url <- "https://www.epa.gov/sites/production/files/2016-12/nla2012_wide_siteinfo_08232016.csv"
nla_stations <- read.csv(nla_url, stringsAsFactors = FALSE)
nla_sf <- st_as_sf(nla_stations, coords = c("LON_DD83", "LAT_DD83"), crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")

(HT to Mike Treglia for the suggestion!)

A word on performance

One of the benefits of using sf is the speed. In my tests it is about twice as fast as the prior standard of sp and rgdal. Let’s look at a biggish shape file with 1 million points!

1 million points

1 million points

## Writing layer `big.shp' to data source `data/big.shp' using driver `ESRI Shapefile'
## features:       1000000
## fields:         0
## geometry type:  Point
#The old way
system.time(rgdal::readOGR("data","big"))
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "big"
## with 1000000 features
## It has 1 fields
## Integer64 fields read as strings:  FID
##    user  system elapsed 
##  16.280   0.204  16.491
#The sf way
system.time(st_read("data/big.shp"))
## Reading layer `big' from data source `/home/jhollist/projects/geospatial_with_sf/data/big.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1000000 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -79.52863 ymin: 32.90149 xmax: -60.2213 ymax: 51.31189
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##    user  system elapsed 
##   6.468   0.032   6.501

Exercise 2

  1. Make sure each of the shapefiles is read in. Follow the instructions from above to complete this.

Basics of sf objects

Perhaps the nicest feature (pun intended) about sf objects is that they are nothing more than data frames. The data for each feature (e.g. the attributes in ESRI speak) are stored in the data frame’s first columns. The last column of the data frame is a “geometry” column which holds the coordinates, and coordinate reference system information. I say this is nice becuase we don’t need to completely learn a new way of working with spatial data. Much of what we now about working with plain old tabular data frames will also work with sf objects.

We can use many of our base R skills directly with these objects.

head(muni)
## Simple feature collection with 6 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 246748.8 ymin: 282524.2 xmax: 360358 ymax: 340916.6
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
##         AREA PERIMETER RITOWN5K_ RITOWN5K_I             NAME MCD CFIPS
## 1  788769737 134174.19         2        380       CUMBERLAND  20     7
## 2  219887686  61203.98         3          1       WOONSOCKET  80     7
## 3  693661121 122732.51         4          2 NORTH SMITHFIELD  55     7
## 4 1588026224 166984.85         5          3     BURRILLVILLE   5     7
## 5  527562041 117805.34         6        381          LINCOLN  45     7
## 6  769680403 111353.48         7          4       SMITHFIELD  75     7
##       COUNTY OSP CFIPS_MCD TWNCODE LAND                       geometry
## 1 PROVIDENCE   8      7020      CU    Y POLYGON((339469.65625 34051...
## 2 PROVIDENCE  39      7080      WO    Y POLYGON((320285.09375 33947...
## 3 PROVIDENCE  25      7055      NS    Y POLYGON((299086.65625 33886...
## 4 PROVIDENCE   3      7005      BU    Y POLYGON((303493.28125 31006...
## 5 PROVIDENCE  17      7045      LI    Y POLYGON((331461.675 320529....
## 6 PROVIDENCE  31      7075      SM    Y POLYGON((303493.28125 31006...
streams$NAME[1:10]
##  [1] Mill River      No Name         Round Top Brook Round Top Brook
##  [5] Peters River    Indian Brook    Chockalog River Chockalog River
##  [9] Chockalog River Chockalog River
## 242 Levels: Abbott Run Brook Acid Factory Brook ... Woonasquatucket River
str(growth_cent)
## Classes 'sf' and 'data.frame':   21 obs. of  3 variables:
##  $ NAME      : Factor w/ 21 levels "BLOCK ISLAND",..: 8 5 14 9 21 4 3 17 13 20 ...
##  $ Cntr_Class: Factor w/ 2 levels "Town","Village": 2 2 2 2 2 2 2 2 2 2 ...
##  $ geometry  :sfc_POINT of length 21; first list element: Classes 'XY', 'POINT', 'sfg'  num [1:2] 279907 321759
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
##   ..- attr(*, "names")= chr  "NAME" "Cntr_Class"

Manipulate sf objects with dplyr, yes, dplyr!

However, what is truly awesome, is the we can use dplyr to manipulate our spatial data and pipe our spatial workflows. First let’s make sure dplyr is loaded up.

library("dplyr")

Now we can work with our spatial data. Let’s filter out the towns that are more than 100 km2:

big_muni <- muni %>%
  mutate(area_skm = AREA*.000000092903) %>%
  group_by(NAME) %>%
  summarize(sum_area_skm = sum(area_skm)) %>%
  filter(sum_area_skm >= 100)
big_muni
## Simple feature collection with 11 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 220310.4 ymin: 23048.49 xmax: 432040.9 ymax: 340916.6
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
## # A tibble: 11 × 3
##               NAME sum_area_skm          geometry
##             <fctr>        <dbl>  <simple_feature>
## 1     BURRILLVILLE     147.5324 <POLYGON((303...>
## 2         COVENTRY     161.7605 <POLYGON((315...>
## 3           EXETER     151.2190 <POLYGON((315...>
## 4           FOSTER     134.6021 <POLYGON((276...>
## 5        GLOCESTER     147.1955 <POLYGON((303...>
## 6        HOPKINTON     114.3237 <POLYGON((248...>
## 7  NORTH KINGSTOWN     114.1386 <MULTIPOLYGON...>
## 8         RICHMOND     105.5525 <POLYGON((297...>
## 9         SCITUATE     141.9332 <POLYGON((276...>
## 10 SOUTH KINGSTOWN     155.9371 <MULTIPOLYGON...>
## 11  WEST GREENWICH     132.6521 <POLYGON((315...>

And first time I tried this, this happened:

mind_blown

mind_blown

Exercise 3

Let’s try the same thing with our land use/land cover data, except without the filter. Use dplyr and

  1. Add a column to convert area from acres to square kilometers
  2. Group by LULC_2011
  3. Sum up the square kilometers
  4. Which land use/land cover class is the biggest? The smallest?

Plotting

There are default plotting methods for sf objects, so we can use our base plotting tools to create some quick maps.

sf plotting methods

If a dataset has multiple attributes, the default plotting will create a separate plot of each. Nice for quickly exploring a dataset.

plot(muni)
## Warning: plotting the first 10 out of 12 attributes; use max.plot = 12 to
## plot all

And to plot just a single field (might be a better way to do this, but it is what makes the most sense to me right now):

plot(muni$geometry, col = muni$COUNTY)

And to add other layers

plot(muni$geometry)
plot(streams$geometry, col = "blue", add = TRUE)
plot(growth_cent$geometry, pch = 19, cex = 1, col = "red", add = TRUE)

That’s nice, but if we want to interact with the data we can’t do that just yet (I’m planning to add sf support to quickmapr). That has been added to the development version of mapview.

mapview

If you don’t already have mapview we need to get it installed and loaded. This isn’t on CRAN yet so we can install with devtools::install_github()

devtools::install_github("environmentalinformatics-marburg/mapview", ref="develop")
library("mapview")

And add some of our sf objects to a Leaflet map with mapview.

mapview(muni) + streams + growth_cent

(sorry not embedded, was having rendering problems that I didn’t have time to solve)

Exercise 4

  1. Play around with base plotting to plot some of these different sf objects.

Analysis

Basic GIS analyses are fully supported in sf. For this introduction we will discuss only one: buffer, clip, and summarize.

Buffer

To buffer features we will use the st_buffer() function. For our example we will buffer a single point and summarize the land use/land cover around that point. We can pipe our full workflow together using %>%.

wk_lulc_summ <- growth_cent %>%
  filter(NAME == "WEST KINGSTON") %>%
  st_buffer(dist = 5280) %>%
  st_intersection(lulc) %>%
  group_by(Descr_2011)%>%
  summarize(area = sum(Acres_2011)) %>%
  arrange(desc(area))
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
wk_lulc_summ
## Simple feature collection with 27 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 306876.9 ymin: 140750.1 xmax: 317436.9 ymax: 151310.1
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
## # A tibble: 27 × 3
##                                           Descr_2011        area
##                                               <fctr>       <dbl>
## 1                   Deciduous Forest (>80% hardwood) 13393.54347
## 2                                Cropland (tillable)   675.92835
## 3                                       Mixed Forest   331.31834
## 4              Developed Recreation (all recreation)   124.49325
## 5                   Power Lines (100' or more width)   111.16976
## 6  Medium Low Density Residential (1 to 2 acre lots)    90.09133
## 7                                            Wetland    82.21363
## 8              Railroads (and associated facilities)    77.12871
## 9    Medium Density Residential (1 to 1/4 acre lots)    74.76852
## 10                                             Water    67.86257
## # ... with 17 more rows, and 1 more variables: geoms <simple_feature>
plot(wk_lulc_summ$geoms, col = wk_lulc_summ$Descr_2011)

Clearly we need to think about our colors some, but the combination of sf and dplyr for doing GIS types of operations is FANTASTIC and I am now completely:

in love

in love

Exercise 5

Let’s get a bit ambitious now. What is the land use land cover totals within 1 km of the Wood River?