\( % Math operators \newcommand{\argmax}{\arg\!\max} \newcommand{\argmin}{\arg\!\min} \newcommand\ev[1]{E\left\langle#1\right\rangle} \newcommand\vv[1]{V\left\langle#1\right\rangle} % Math commands \newcommand{\mat}[1]{\mathbf{#1}} % Math symbols \renewcommand{\NN}{\mathcal{N}} \renewcommand{\UU}{\mathcal{U}} \renewcommand{\LL}{\mathcal{L}} \renewcommand{\RR}{\mathbb{R}} \)
This case study documents the implementation in Stan (Carpenter et al. 2017) of the Hidden Markov Model (HMM) for unsupervised learning (Baum and Petrie 1966; Baum and Eagon 1967; Baum and Sell 1968; Baum et al. 1970; Baum 1972). Additionally, we present the adaptations needed for the Input-Output Hidden Markov Model (IOHMM). IOHMM is an architecture proposed by Bengio and Frasconi (1995) to map input sequences, sometimes called the control signal, to output sequences. Compared to HMM, it aims at being especially effective at learning long term memory, that is when input-output sequences span long points. In all cases, we provide a fully Bayesian estimation of the model parameters and inference on hidden quantities, namely filtered and smoothed posterior distribution of the hidden states, and jointly most probable state path.
A Tutorial on Hidden Markov Models using Stan is distributed under the Creative Commons Attribution 4.0 International Public License. Accompanying code is distributed under the GNU General Public License v3.0. See the README file for details. All files are available in the stancon18 GitHub repository.
We acknowledge the members of the Stan Development Team for being so passionate about Bayesian statistics and technically unbeatable. Special mentions to Aaron Goodman, Ben Bales and Bob Carpenter for their active participation in the discussions held in Stan forums for HMM with constraints and HMMM. Although not strictly related, the discussion remains very valuable for the current tutorial.
We acknowledge the Google Summer Of Code (GSOC) program for funding. This tutorial builds on top of our project: Bayesian Hierarchical Hidden Markov Models applied to financial time series.
## Warning in readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")):
## incomplete final line found on 'C:\Users\Bebop/.R/Makevars'
## CXXFLAGS=-O3 -Wno-unused-variable -Wno-unused-function
## Session info -------------------------------------------------------------
## setting value
## version R version 3.4.1 (2017-06-30)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate Spanish_Argentina.1252
## tz America/Buenos_Aires
## date 2017-12-17
## Packages -----------------------------------------------------------------
## package * version date source
## BH 1.65.0-1 2017-08-24 CRAN (R 3.4.1)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.1)
## dichromat 2.0-0 2013-01-24 CRAN (R 3.4.1)
## digest 0.6.12 2017-01-27 CRAN (R 3.4.1)
## ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.1)
## graphics * 3.4.1 2017-06-30 local
## grDevices * 3.4.1 2017-06-30 local
## grid 3.4.1 2017-06-30 local
## gridExtra 2.3 2017-09-09 CRAN (R 3.4.1)
## gtable 0.2.0 2016-02-26 CRAN (R 3.4.1)
## inline 0.3.14 2015-04-13 CRAN (R 3.4.1)
## labeling 0.3 2014-08-23 CRAN (R 3.4.1)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.1)
## lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.1)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.1)
## MASS 7.3-47 2017-02-26 CRAN (R 3.4.1)
## Matrix 1.2-10 2017-05-03 CRAN (R 3.4.1)
## methods * 3.4.1 2017-06-30 local
## munsell 0.4.3 2016-02-13 CRAN (R 3.4.1)
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.1)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.1)
## RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.1)
## Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1)
## RcppEigen 0.3.3.3.0 2017-05-01 CRAN (R 3.4.1)
## reshape2 1.4.2 2016-10-22 CRAN (R 3.4.1)
## rlang 0.1.2 2017-08-09 CRAN (R 3.4.1)
## rstan * 2.16.2 2017-07-03 CRAN (R 3.4.1)
## scales 0.5.0 2017-08-24 CRAN (R 3.4.1)
## StanHeaders * 2.16.0-1 2017-07-03 CRAN (R 3.4.1)
## stats * 3.4.1 2017-06-30 local
## stats4 3.4.1 2017-06-30 local
## stringi 1.1.5 2017-04-07 CRAN (R 3.4.1)
## stringr 1.2.0 2017-02-18 CRAN (R 3.4.1)
## tibble 1.3.4 2017-08-22 CRAN (R 3.4.1)
## tools 3.4.1 2017-06-30 local
## utils * 3.4.1 2017-06-30 local
## viridisLite 0.2.0 2017-03-24 CRAN (R 3.4.1)
Bakis, Raimo. 1976. “Continuous Speech Recognition via Centisecond Acoustic States.” The Journal of the Acoustical Society of America 59 (S1). ASA: S97–S97.
Baum, Leonard E, and Ted Petrie. 1966. “Statistical Inference for Probabilistic Functions of Finite State Markov Chains.” The Annals of Mathematical Statistics 37 (6). JSTOR: 1554–63.
Baum, Leonard E, and George Sell. 1968. “Growth Transformations for Functions on Manifolds.” Pacific Journal of Mathematics 27 (2). Mathematical Sciences Publishers: 211–27.
Baum, Leonard E. 1972. “An Inequality and Associated Maximaization Technique in Stattistical Estimation for Probablistic Functions of Markov Process.” Inequalities 3: 1–8.
Baum, Leonard E., and J. A. Eagon. 1967. “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology.” Bulletin of the American Mathematical Society 73 (3). American Mathematical Society (AMS): 360–64. doi:10.1090/s0002-9904-1967-11751-8.
Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics 41 (1). Institute of Mathematical Statistics: 164–71. doi:10.1214/aoms/1177697196.
Bengio, Yoshua, and Paolo Frasconi. 1995. “An Input Output Hmm Architecture.”
Betancourt, Michael. 2017. “Identifying Bayesian Mixture Models.” Stan Case Studies, no. Volume 4. http://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html.
Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). doi:10.18637/jss.v076.i01.
Gelman, Andrew, and Donald B Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science. JSTOR, 457–72.
Jelinek, Frederick. 1976. “Continuous Speech Recognition by Statistical Methods.” Proceedings of the IEEE 64 (4). IEEE: 532–56.
Jordan, Michael I. 2003. “An Introduction to Probabilistic Graphical Models.” preparation.
Murphy, Kevin P. 2012. Machine Learning. MIT Press Ltd.
Rabiner, Lawrence R. 1990. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In Readings in Speech Recognition, 267–96. Elsevier. doi:10.1016/b978-0-08-051584-7.50027-9.
Stan Development Team. 2017a. “RStan: The R Interface to Stan.” http://mc-stan.org/.
———. 2017b. “Shinystan: Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models.” http://mc-stan.org/.
———. 2017c. Stan Modeling Language: User’s Guide and Reference Manual: Version 2.15.0.
Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.” IEEE Transactions on Information Theory 13 (2). Institute of Electrical; Electronics Engineers (IEEE): 260–69. doi:10.1109/tit.1967.1054010.
Corresponding author: damiano.luis@gmail.com.↩
The output can be univariate or multivariate depending on the choice of model specification, in which case an observation at a given time index \(t\) is a scalar \(y_t\) or a vector \(\mat{y}_t\) respectively. Although we introduce definitions and properties along an example based on a univariate series, we keep the bold notation to remind that the equations are valid in a multivariate context as well.↩