More time series; bivariate smooths and ti()

Background

This example is an extension of the Fort Lauderdale time series model. Here we look at two additional advanced features, namely

allowing the seasonal and trend terms to interact, and
setting up an ANOVA-like decomposition for interaction smooth using ti() terms

The model in the earlier example estimated a constant non-linear seasonal (within-year) effect and a constant non-linear trend (between year) effect. Such a model is unable to capture any potential change in the seasonal effect (the within-year distribution of temperatures) over time. Or, put another way, is unable to capture potentially different trends in temperature are particular times of the year.

This example extends the simpler model to more generally model changes in temperature.

Loading and viewing the data

The time series we’ll look at is a long-term temperature and precipitation data set from the convention center here at Fort Lauderdale, consisting of monthly mean temperature (in degrees c), the number of days it rained that month, and the amount of precipitation that month (in mm). The data range from 1950 to 2015, with a column for year and month (with Jan. = 1). The data is from a great web-app called FetchClim2, and the link for the data I used is here.

Here we’ll load it and plot the air temperature data.

library("mgcv")
library("ggplot2")
florida <- read.csv("./data/time_series/Florida_climate.csv",
                    stringsAsFactors = FALSE)
head(florida)

##      lat     lon air_temp    precip year month
## 1 26.099 -80.123 21.56915  18.55344 1950     1
## 2 26.099 -80.123 21.11481  26.17154 1950     2
## 3 26.099 -80.123 21.68798  78.39297 1950     3
## 4 26.099 -80.123 21.38804  49.63116 1950     4
## 5 26.099 -80.123 24.73590 156.15713 1950     5
## 6 26.099 -80.123 26.81385  85.51056 1950     6

ggplot(data = florida, aes(x = month, y = air_temp, color = year, group = year)) +
  geom_point() +
  geom_line()

Fitting the smooth interaction model

Following from the earlier example, we fit a smooth interaction model using a tensor product spline of year and month. Notice how we specify the marginal basis types as a character vector. Here I restrict the model to AR(1) errors as this was the best model in the earlier example and it is unlikely that additional unmodelled temporal structure will turn up when we use a more general model!

m.full <- gamm(air_temp ~ te(year, month, bs = c("tp", "cc")), data = florida,
               correlation = corAR1(form = ~ 1 | year),
               knots = list(month = c(0.5, 12.5)))
layout(matrix(1:3, nrow = 1))
plot(m.full$gam, scheme = 1, theta = 40)
plot(m.full$gam, scheme = 1, theta = 80)
plot(m.full$gam, scheme = 1, theta = 120)

layout(1)
summary(m.full$gam)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## air_temp ~ te(year, month, bs = c("tp", "cc"))
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  23.9348     0.0495   483.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                  edf Ref.df     F p-value    
## te(year,month) 10.59  10.59 413.6  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.906   
##   Scale est. = 0.82092   n = 792

The plot of the tensor product smooth is quite illustrative as to what the model is attempting to do. The earlier model would be forced to retain the same seasonal curve throughout the time series. Now, with the more complex model, the seasonal curve is allowed to change shape smoothly through time.

Our task now is to test whether this additional complexity is worth the extra effort? Are some months warming more than others in Fort Lauderdale?

Fitting the ANOVA-decompososed model

The logical thing to do to test whether some months are warming more than others would be to drop the interaction term from the model and compare the simpler and more complex models with AIC or a likelihood ratio test. But if you look back at the model we just fitted, there is no term in that model that relates just to the interaction! The tensor product contains both the main and interactive effects. What’s worse is that we can’t (always) compare this model with the one we fitted in the earlier example because we need to be careful to ensure the models are properly nested. This is where the ti() smoother comes in.

A ti() smoother is like a standard tensor product (te()) smooth, but without the main effects smooths. Now we can build a model with separate main effects smooths for year and month and add in the interaction smooth using ti(). This will allow us to fit two properly nested that can be compared to test our null hypothesis that the months are warming at the same rate.

Let’s fit those models

m.main <- gam(air_temp ~ s(year, bs = "tp") + s(month, bs = "cc"), data = florida,
              knots = list(month = c(0.5, 12.5)))
m.int  <- gam(air_temp ~ ti(year, bs = "tp") + ti(month, bs = "cc") +
                ti(year, month, bs = c("tp", "cc"), k = 11), data = florida,
              knots = list(month = c(0.5, 12.5)))

If you do this with gamm() the model fitting will fail, and for good reason. At least the two gam() models converged so we can see what is going on. Compare the two models using AIC()

AIC(m.main, m.int)

##              df      AIC
## m.main 15.69941 2048.289
## m.int  27.75776 2043.275

AIC tells us that there is some support for the interaction model, but only when we force a more complex surface by specifying k.

Regardless, the interaction term only affords a >1% improvement in deviance explained; whilst the interaction may be favoured by AIC, any variation in the seasonal effect over time must be small.

Exercises

For an example where there is a stronger change in the seasonal effect, see my two blog posts on applying these models to the Central England Temperature series:
- Post 1
- Post 2
These two posts take you through a full analysis of this data series.