An Introduction to Hidden Markov Models with pomp

September 14, 2017

Introduction

About me

GVSU Math / UMich Bioinformatics
Worked with Prof. Aaron King to build biological models in R
I was an R novice a year and a half ago!
ashtonsb@umich.edu
github.com/ashtonbaker

About Markov Models

Much more than the Markov Chains you might have seen in Linear Algebra
Markov models are great for time series!
They allow us to use our understanding of the mechanisms that are generating data.

Markov Models

Markov Chain

A stochastic process that satisfies the Markov property
All the information about the future state of the system is contained in the present state of the system.
States of the system at sequential time points are related by a stochastic process with parameters \(\theta\).

\[X_0 \overset{\theta}{\rightarrow} X_1 \overset{\theta}{\rightarrow} X_2 \overset{\theta}{\rightarrow} \cdots\]

Markov Chain

A stochastic process that satisfies the Markov property
All the information about the future state of the system is contained in the present state of the system.
States of the system at sequential time points are related by a stochastic process with parameters \(\theta\).

\[\begin{bmatrix}A_0\\B_0\\C_0\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_1\\B_1\\C_1\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_2\\B_2\\C_2\end{bmatrix} \overset{\theta}{\rightarrow} \cdots\]

Markov Chain Example

The state variables in this system are binary-valued:

##   X.bull. X.bear. X.stagnant.
## 1       1       0           0
## 2       0       1           0
## 3       1       0           0
## 4       0       0           1
## 5       1       0           0
## 6       0       1           0

Best way to estimate parameters in this case is to calculate MLE for the transition probabilities

Hidden Markov Model

The underlying system is a Markov process, with unobserved states.
Data is generated by a (stochastic) observation process, and comprise "observed" states.
Observation states and Process states may have any of a variety of relationships.

\[ \begin{matrix} X_0 & \rightarrow & X_1 & \rightarrow & X_2 & \rightarrow & \cdots & \text{Signal} \\ \downarrow & & \downarrow & & \downarrow & & & \\ Y_0 & & Y_1 & & Y_2 & & \cdots & \text{Observations} \end{matrix} \]

Transcription Factor Binding Site Prediction

Hidden state: position in binding site.
Observed state: nucleic acid sequence

\[\begin{array} \text{B} & B & B & B & 1 & 2 & 3 & 4 & B & \text{Signal} \\ A & C & G & A & A & C & T & C & A & \text{Observations} \end{array}\]

Mathelier, A., & Wasserman, W. W. (2013). The next generation of transcription factor binding site prediction. PLoS computational biology, 9(9), e1003214.

Using pomp

We can use the pomp package to encode Hidden Markov Models as R objects.

The structure of a pomp model

Our goal is to use pomp to perform "simulation-based inference"
We need to encode everything that we need to simulate the process that we hypothesize

Ricker model

A Ricker model is a population model that relates the population at time \(t\) to the population at time \(t + 1\). \[N_{t+1} =r N_t e^{\left(-\frac{N_t}{k}\right)}\]
\(r\) is the intrinsic growth rate
\(k\) is the carrying capacity

A stochastic Ricker Model

If we assume that the growth rate \(r\) is log-normally distributed, then \[N_{t+1} = r N_t \exp(-N_t/k + \epsilon_t),\] where \(\epsilon_t \sim \text{Normal}(0, \sigma)\).
In any real system, we need to model measurement error: \[y_t \sim \text{Poisson}(\phi N_t)\]

Ricker model

library(ggplot2)
data <- read.csv("./data/rickerdata.csv")
ggplot(data, aes(time, pop)) + geom_line()

Using pomp

pomp requires a few things to build a minimal model:

library(pomp)

rickerModel <- pomp(data = data,
                    times = 'time',
                    t0 = 0)

Specifying a few more arguments will help us in the next step:

rickerModel <- pomp(rickerModel,
                    statenames = c('N'),
                    obsnames = c('pop'),
                    paramnames = c('r','k','sigma'))

Initialization process

In our model, the initial values for \(N\) and \(\epsilon\) are parameters.

init <- function (params, t0,...){
  return(c('N' = params[['N.0']] ))
}

rickerModel <- pomp(rickerModel, initializer = init)

Specifying the stochastic process

rproc <- discrete.time.sim(
  step.fun = function(x, t, params, delta.t, ...) {
    sigma <- params[['sigma']]
    r <- params[['r']]
    k <- params[['k']]
    N <- x[['N']]
    e <- rnorm(1, mean=0, sd=sigma)

    result <- c('N' = r * N * exp(e - N/k))

    return(result)
  },
  delta.t = 1
)

rickerModel <- pomp(rickerModel, rprocess = rproc)

Specifying the measurement model

\(y_t \sim \text{Poisson}(\phi N_t)\)

rmeas <- function (x, t, params, ...) {
  N <- x[['N']]
  phi <- params[['phi']]
  result <- c('pop' = rpois(n=1, lambda=phi * N))

  return(result)
}

rickerModel <- pomp(rickerModel, rmeasure = rmeas)

Specifying the measurement model distribution function

dmeasure should be the PDF of the Poisson distribution we used for rmeasure. This is available in R as the dpois function.

dmeas <- function(y, x, t, params, log, ...) {
  N <- x[['N']]
  pop <- y[['pop']]
  phi <- params[['phi']]

  return(dpois(pop, lambda = phi*N, log=log))
}

rickerModel <- pomp(rickerModel, dmeasure = dmeas)

Simulation

p <- c(N.0=1, r=20, k=1,sigma=0.1,phi=200)
sim <- simulate(rickerModel, params = p, as.data.frame = T)
ggplot(sim, aes(time, pop)) + geom_line()

Comparing simulations to the data

sim <- simulate(rickerModel, params = p, nsim=8, as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim,aes(x=time,y=pop,group=sim,color=(sim=="data")))+
  geom_line()+guides(color=FALSE)+
  facet_wrap(~sim,ncol=3)

Adjusting parameters

p <- c(N.0=68, r=40, k=11,sigma=0.1,phi=1)
sim <- simulate(rickerModel, params = p, nsim=8,as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim,aes(x=time,y=pop,group=sim,color=(sim=="data")))+
  geom_line()+guides(color=FALSE)+
  facet_wrap(~sim,ncol=3)

A Crash Course on Likelihood

Likelihood

Calculated in the same way as probability \[\mathcal{L}(\theta | x) = P(x | \theta)\]
Probability: How likely is outcome \(x\) under this model and parameters \(\theta\)?
Likelihood: How likely do the parameters \(\theta\) explain the data \(x\) under the model?
The data, model, and parameters together are defined by a single number: likelihood. This can be used to compare models.

Likelihood for HMMs

\[ \begin{matrix} X_0 & \rightarrow & X_1 & \rightarrow & X_2 & \rightarrow & \cdots \\ & & \downarrow & & \downarrow & & \\ & & Y_1 & & Y_2 & & \end{matrix} \]

\(f_{X_0}\) : PDF of the system state at time \(t=0\)
\(f_{X}\) : PDF of \(X_t\) given \(X_{t-1}\)
\(f_{Y}\) : PDF of \(Y_t\) given \(X_t\)
Probability of initial state \(x_0\), state transition \(x_0 \rightarrow x_1\), and observation \(y_1\): \[f_{X_0}(x_0; \theta) f_X(x_1 | x_0; \theta) f_Y(y_1 | x_1; \theta)\]

Likelihood for HMMs

Probability of initial state \(x_0\), transitions \(x_1, \ldots x_N\), and observations \(y_1, \ldots, y_N\): \[f_{X,Y}(x_{0:N}, y_{1:N}; \theta) = f_{X_0}(x_0; \theta) \prod_{t=1}^N f_X(x_t | x_{t-1}; \theta) f_Y(y_t | x_t; \theta)\]
We have data \(y_{1:N}^*\), but no observations for \(x_{1:N}\). The solution is to compute the marginal PDF of \(y_{1:N}\) and evaluate at the data \(y_{1:N}^*\):

\[ \mathcal{L}(\theta) = \int_{x_{0:N}} f_{X,Y}(x_{0:N}, y_{1:N}^*; \theta) \; d x_{0:N}\]

Likelihood for HMMs

We can sum the log-likelihood of successive transitions to obtain the log-likelihood of the entire sequence (due to the Markov property). But since the hidden state transitions are unknown, we have to sum over all possibilities. This is usually impossible.

\[ \begin{matrix} X_0 & \rightarrow & X_1 & \rightarrow & X_2 & \rightarrow & \cdots \\ & & \downarrow & & \downarrow & & \\ & & 4 & & 6 & & \end{matrix} \]

Better ideas

If we specify the pdf of the measurement model, \(f_Y(y_t | x_t; \theta)\) (or in pomp, the dmeasure function), then we can run simulations of the underlying markov process using rprocess, and compute the likelihood of the data using dmeasure.
This gives an unbiased estimate of the likelihood! So all we have to do is compute the likelihood this way many times, and take the average.
Unfortunately, this scales poorly with long time series.

Even better ideas

The Particle Filter uses the same basic idea, but at each step the probability of survival of the simulation, or "particles", is weighted by the likelihood of the simulation at that time point.
It's easy to use in pomp!

p <- c(N.0=68, r=40, k=11,sigma=0.1,phi=1)
pf <- pfilter(rickerModel, params = p, Np = 5000)
logLik(pf)

## [1] -150.4366

Comparing parameters

p1 <- c(N.0=1, r=20, k=1,sigma=0.1,phi=200)
p2 <- c(N.0=68, r=40, k=11,sigma=0.1,phi=1)
pf1 <- pfilter(rickerModel, params = p1, Np = 5000)

## Warning: in 'pfilter': 23 filtering failures occurred.

pf2 <- pfilter(rickerModel, params = p2, Np = 5000)

logLik(pf1)

## [1] -1150.059

logLik(pf2)

## [1] -147.5515

Finding the MLE parameters

MLE parameters

Replaced the model with a C implementation behind the scenes
One way of maximizing likelihood is to use a regular optimization function:

f <- function(par) {
  -logLik(pfilter(rickerModel, params = par, Np = 1000))
}

fit <- optim(par = c(r=44.60118, sigma=0.3, phi=10.0,
                     c=1.0, N.0 = 7.0, e.0=0.0), fn = f)

MLE parameters

The optim function has given us a better set of parameters. Let's get a good estimate for the log-likelihood using pfilter:

logLik(pfilter(rickerModel, params = fit$par, Np = 5000))

## [1] -137.4063

Iterated filtering

A better way is to use "iterated filtering" implemented in pomp as mif2()
Maximum likelihood by iterated, perturbed Bayes maps
Uses a process similar to simulated annealing.

mif <- mif2(rickerModel, Nmif=5, start=c(r=44.60118, sigma=0.3, phi=10.0,
                     c=1.0, N.0 = 7.0, e.0=0.0), Np=5000,
            rw.sd=rw.sd(r=0.05, sigma=0.05, phi=0.05,
                        c=0.05, N.0 = 0.05, e.0=0.05),
            transform=TRUE, cooling.type='geometric',
            cooling.fraction.50=0.05)

Iterated filtering

How do the MLE estimates compare? How do the likelihood estimates compare?

coef(mif)

##            r        sigma          phi            c          N.0 
## 47.833089623  0.160891645 11.448253544  0.925698622  7.376648974 
##          e.0 
##  0.003582483

fit$par

##          r      sigma        phi          c        N.0        e.0 
## 44.8411661  0.2152585 10.0607183  0.9968845  7.0031875  3.3812318

logLik(pfilter(mif, Np=5000))

## [1] -177.4154

Iterated filtering

We might not have gotten better results here, but we can pick up our mif object and keep working:

mif <- mif2(mif, Nmif=30)
logLik(mif)

## [1] -140.0469