Stan Tutorial

June 25, 2017

Module objective

In this module, we will fit a hierarchical multinomial logit model using choice based conjoint data for chocolate bars.

This will allow us to:
- Learn more Stan syntax
- Work with a hierarchical model
- Fit a model that is commonly fit in marketing and usually requres specialized software

Many will be familar with estimating this model using Sawtooth CBC/HB or the ChoiceModelR package in R.

More Stan syntax

Hierarchical models

There is substantial evidence that different people have different preferences over the features of products. To accomodate this, we alter the multnomial logit model by allowing each person to have her own vector of part-worths for the attributes.

Y[r,s] ~ categorical_logit(X[r,s]*Beta[,r]);

Becuase there is typically insufficient data to estimate each person's beta[,r] vector, we then impose a distribution on the individual-level betas.

Beta[,r] ~ multi_normal(Theta*Z[,r], Sigma);

Multivariate normal distribution

This is the first time we have seen a multivariate distribution in Stan. This is the multivariate normal distribution.

y ~ multi_normal(mu, Sigma);

y is a vector (or an array of vectors for multiple observations)
mu is the mean vector
Sigma is the covariance matrix.

If this distribution is new to you, check out the wikipedia page.

Covariance matricies

Covariance matrices like Sigma are square matrices with special properties. Stan provides special data types just for storing covariance matrices:

cov_matrix[K] mycovmatrix;

And there is a correlation matrix type as well:

corr_matrix[K] mycorrmatrix;

Other parameterizations

If you prefer other parameterizations of the multivariate normal, they are available.

y ~ multi_normal_prec(mu, Omega);
y ~ multi_normal_cholesky(mu, L);

There are even special types for Cholesky factors, if you are into that. (It can be important for computational efficiency.)

cholesky_factor_cov[K] L;
cholesky_factor_cor[K] L;

Other multivariate distributions

Gaussian process

y ~ multi_gp(Sigma, w);

Multivariate Student t

y ~ multi_student_t(nu, mu, Sigma);

Gaussian DLM

y ~ gaussian_dlm_obs(F, G, V, W, m0, C0);

See the the Stan Modeling Language Users Guide and Reference Manual for more!

Priors for covariances

One of the difficult things about using the multivariate normal distribution is putting a prior on the covariance Sigma. Gibbs samplers often use the inverted Wishart prior, which is conjugate. And Stan supports that:

W ~ inv_wishart(nu, Sigma);

However, this prior often puts undue weight near Sigma and can influence the posterior in strange ways.

LKJ Prior for correlations

Many analysts have an intuitive sense for correlations; they are always between -1 and 1 with zero indicating no association. The LKJ distribution puts a prior on the correlation matrix.

Omega ~ lkj_corr(eta);

If eta = 1 then the density is uniform over all correlation matrices of a given order
If eta > 1 the identity matrix is the modal correlation matrix, with sharper peaks in the density around the identity matrix for larger eta. This allows us to reduce correlations, if we have prior beliefs that there are not many correlations between part-worths.

Visualizing the LKJ prior

This image was borrowed from the Psychstatistics blog, which illustrates how to use Stan to simulate from the LKJ prior to produce this visualization.

Using LKJ to put a prior on covariance

We can put a prior on a covarince by specifing a variance vector tau along with a correlation matrix Omega (with the LKJ prior) and then use the quad_form_diag function to compute a covariance.

parameters{
  corr_matrix[K] Omega;
  vector<lower=0>[K] tau;
}
model{
  Omega ~ lkj_corr(2);
  Beta[,r] ~ multi_normal(mu, quad_form_diag(Omega, tau));
}

Truncated distributions

You can truncate any distribution using this syntax:

mu ~ normal(mu, sigma) T[lower, upper];

Assignment

Occasionally, you may need to assign one variable (data or parameter) to another, e.g. when you are tranforming data internally in Stan. To do that we use =.

The symbol ~ is for sampling statements that define random variables.

We will use truncation and assignment in just a few more slides.

Hierarchical multnomial logit model in Stan

Enough syntax! Let's get on to the model. You can open the full model specification in hmnl.stan.

Data

Putting all of that new syntax together, we can specify a hierarchical multinomial logit model.

data {
  int<lower=2> C; // # of alternatives (choices) in each scenario
  int<lower=1> K; // # of covariates of alternatives
  int<lower=1> R; // # of respondents
  int<lower=1> S; // # of scenarios per respondent
  int<lower=1> G; // # of respondent covariates 
  int<lower=1,upper=C> Y[R, S]; // observed choices
  matrix[C, K] X[R, S]; // matrix of attributes for each obs
  matrix[G, R] Z; // vector of covariates for each respondent
}

Choice sets of varying size

You may have noticed that we have defined X to be a fixed size for each choice question.

  matrix[C, K] X[R, S]; // matrix of attributes for each obs

It is possible to write a Stan model that will accomodate a different number of alternatives in each scenario. Look for information on ragged arrays in Stan.

Parameters

parameters {
  matrix[K, R] Beta;
  matrix[K, G] Theta;
  corr_matrix[K] Omega;
  vector<lower=0>[K] tau;
}

Transformed parameters

While it can be nice to look at the output in terms of the variance vector tau and the correlation matrix Omega, some still like covariances. We can compute the covariance using the transformed parameters block in Stan.

transformed parameters {
  cov_matrix[K] Sigma = quad_form_diag(Omega, tau);
}

If you aren't using the tranformed parameter in the model block, it is more computationally efficient to put it in the generated quantities block.

Model

We can use any transformed parameters in our model block.

model {
  //priors
  to_vector(Theta) ~ normal(0, 10);
  tau ~ cauchy(0, 2.5); 
  Omega ~ lkj_corr(2);
  //likelihood
  for (r in 1:R) {
    Beta[,r] ~ multi_normal(Theta*Z[,r], Sigma);    
    for (s in 1:S)
      Y[r,s] ~ categorical_logit(X[r,s]*Beta[,r]);
  }
}

Bayesian inference

Get setup in R

It's time to open the hmnl.R file and start running R code.

library(rstan)
library(MASS)
library(shinystan)
# writes a compiled Stan program to the disk to avoid recompiling
rstan_options(auto_write=TRUE) 
# allows Stan chains to run in parallel on multiprocessor machines
options(mc.cores = parallel::detectCores())

Testing the hierarchical multinomial logit model with synthetic data

It's a good idea to test any new model with synthetic data.

Synthetic data generation function (1)

# function to generate mnl data
generate_hmnl_data <- function(R=100, S=30, C=3, 
                               Theta=matrix(rep(1, 8), nrow=2), 
                               Sigma=diag(0.1, 4)){
  K <- ncol(Theta)
  G <- nrow(Theta)
  Y <- array(dim=c(R, S))
  X <- array(rnorm(R*S*C*K), dim=c(R, S, C, K)) # normal covariates
  Z <- array(dim=c(G, R))
  Z[1,] <- 1  # intercept
  if (G > 1) {
    Z[2:G,] <- rnorm(R*(G-1)) # normal covariates
  }
  Beta <- array(dim=c(K, R))
  for (r in 1:R) {
    Beta[,r] <- mvrnorm(n=1, mu=Z[,r]%*%Theta, Sigma=Sigma)
    for (s in 1:S)
      Y[r,s] <- sample(x=C, size=1, prob=exp(X[r,s,,]%*%Beta[,r]))
   }
  list(R=R, S=S, C=C, K=K, G=G, Y=Y, X=X, Z=Z, 
       beta.true=beta, Theta.true=Theta, Sigma.true=Sigma)
}

Synthetic data generation function (2)

 for (r in 1:R) {
    Beta[,r] <- mvrnorm(n=1, mu=Z[,r]%*%Theta, Sigma=Sigma)
    for (s in 1:S)
      Y[r,s] <- sample(x=C, size=1, prob=exp(X[r,s,,]%*%Beta[,r]))
   }
  list(R=R, S=S, C=C, K=K, G=G, Y=Y, X=X, Z=Z, 
       beta.true=beta, Theta.true=Theta, Sigma.true=Sigma)
}

Synthetic data

d1 <- generate_hmnl_data()
str(d1)

## List of 11
##  $ R         : num 100
##  $ S         : num 30
##  $ C         : num 3
##  $ K         : int 4
##  $ G         : int 2
##  $ Y         : int [1:100, 1:30] 1 3 1 1 2 1 3 2 3 1 ...
##  $ X         : num [1:100, 1:30, 1:3, 1:4] -0.96 0.702 0.996 0.697 0.403 ...
##  $ Z         : num [1:2, 1:100] 1 -0.18 1 -0.517 1 ...
##  $ beta.true :function (a, b)  
##  $ Theta.true: num [1:2, 1:4] 1 1 1 1 1 1 1 1
##  $ Sigma.true: num [1:4, 1:4] 0.1 0 0 0 0 0.1 0 0 0 0 ...

Running the hmnl on synthetic data

test.stan <- stan(file="hmnl.stan", data=d1, iter=1000, chains=4)

Traceplots: Theta

plot(test.stan, plotfun="trace", pars=("Theta"))

Traceplots: tau

plot(test.stan, plotfun="trace", pars=c("tau"))

Traceplots: Omega

plot(test.stan, plotfun="trace", pars=("Omega"))

Summary: Theta (Population means of part-worths)

summary(test.stan, pars=c("Theta"))$summary

##                 mean     se_mean         sd      2.5%       25%       50%
## Theta[1,1] 0.9664853 0.003289073 0.05871168 0.8559129 0.9272654 0.9658026
## Theta[1,2] 0.9890953 0.003818164 0.06645128 0.8612869 0.9446815 0.9873342
## Theta[2,1] 1.0867097 0.003694286 0.06265623 0.9679033 1.0447422 1.0852792
## Theta[2,2] 0.9526273 0.004061740 0.06800239 0.8256210 0.9063639 0.9500058
## Theta[3,1] 1.0374797 0.003646353 0.06322426 0.9183316 0.9942705 1.0356074
## Theta[3,2] 0.9597821 0.003822722 0.06694561 0.8276300 0.9153848 0.9599630
## Theta[4,1] 0.8991780 0.003409581 0.05796822 0.7854428 0.8605153 0.8978740
## Theta[4,2] 0.9187336 0.003879248 0.06545671 0.7912672 0.8747333 0.9172100
##                  75%    97.5%    n_eff     Rhat
## Theta[1,1] 1.0036755 1.090260 318.6413 1.010939
## Theta[1,2] 1.0310185 1.128394 302.8988 1.014396
## Theta[2,1] 1.1268528 1.214316 287.6521 1.019590
## Theta[2,2] 0.9991955 1.087035 280.3007 1.022032
## Theta[3,1] 1.0794034 1.161499 300.6423 1.010224
## Theta[3,2] 1.0060029 1.090500 306.6894 1.014838
## Theta[4,1] 0.9385526 1.017412 289.0533 1.015670
## Theta[4,2] 0.9630276 1.051704 284.7169 1.015979

Summary: tau (Population variances of part-worths)

summary(test.stan, pars=c("tau"))$summary

##             mean     se_mean         sd      2.5%       25%       50%
## tau[1] 0.2721224 0.003170315 0.05333490 0.1699761 0.2355878 0.2716348
## tau[2] 0.3416312 0.003283358 0.05977546 0.2260780 0.3009205 0.3401240
## tau[3] 0.3393442 0.002702649 0.05537346 0.2361447 0.3010529 0.3375400
## tau[4] 0.3006639 0.003480961 0.05730626 0.1931683 0.2625499 0.2998464
##              75%     97.5%    n_eff     Rhat
## tau[1] 0.3077542 0.3783804 283.0206 1.006125
## tau[2] 0.3821680 0.4601295 331.4434 1.026160
## tau[3] 0.3755262 0.4497066 419.7821 1.004391
## tau[4] 0.3389990 0.4125203 271.0227 1.012258

Summary: Omega (Populaltion correlations of part-worths)

summary(test.stan, pars=c("Omega"))$summary

##                   mean      se_mean           sd        2.5%         25%
## Omega[1,1]  1.00000000 0.000000e+00 0.000000e+00  1.00000000  1.00000000
## Omega[1,2]  0.13832092 9.949535e-03 2.105344e-01 -0.26850940 -0.01459354
## Omega[1,3]  0.34687244 1.059740e-02 1.977162e-01 -0.06572557  0.21572539
## Omega[1,4]  0.38380129 9.812077e-03 2.035964e-01 -0.03910087  0.25133594
## Omega[2,1]  0.13832092 9.949535e-03 2.105344e-01 -0.26850940 -0.01459354
## Omega[2,2]  1.00000000 2.164052e-18 9.677935e-17  1.00000000  1.00000000
## Omega[2,3]  0.15667044 9.404001e-03 1.952858e-01 -0.23544166  0.02508082
## Omega[2,4]  0.08956846 8.630277e-03 2.081130e-01 -0.33060279 -0.05444666
## Omega[3,1]  0.34687244 1.059740e-02 1.977162e-01 -0.06572557  0.21572539
## Omega[3,2]  0.15667044 9.404001e-03 1.952858e-01 -0.23544166  0.02508082
## Omega[3,3]  1.00000000 2.172563e-18 9.665185e-17  1.00000000  1.00000000
## Omega[3,4] -0.02456565 9.692265e-03 2.106520e-01 -0.42633868 -0.16507893
## Omega[4,1]  0.38380129 9.812077e-03 2.035964e-01 -0.03910087  0.25133594
## Omega[4,2]  0.08956846 8.630277e-03 2.081130e-01 -0.33060279 -0.05444666
## Omega[4,3] -0.02456565 9.692265e-03 2.106520e-01 -0.42633868 -0.16507893
## Omega[4,4]  1.00000000 2.501224e-18 1.052928e-16  1.00000000  1.00000000
##                    50%       75%     97.5%     n_eff     Rhat
## Omega[1,1]  1.00000000 1.0000000 1.0000000 2000.0000      NaN
## Omega[1,2]  0.14710065 0.2929994 0.5162311  447.7553 1.001380
## Omega[1,3]  0.35418894 0.4886311 0.7047441  348.0852 1.017774
## Omega[1,4]  0.39399921 0.5301021 0.7307585  430.5447 1.003095
## Omega[2,1]  0.14710065 0.2929994 0.5162311  447.7553 1.001380
## Omega[2,2]  1.00000000 1.0000000 1.0000000 2000.0000 0.997998
## Omega[2,3]  0.16382942 0.2927856 0.5116891  431.2371 1.010627
## Omega[2,4]  0.09601608 0.2308097 0.4930113  581.4990 1.000440
## Omega[3,1]  0.35418894 0.4886311 0.7047441  348.0852 1.017774
## Omega[3,2]  0.16382942 0.2927856 0.5116891  431.2371 1.010627
## Omega[3,3]  1.00000000 1.0000000 1.0000000 1979.1349 0.997998
## Omega[3,4] -0.02271670 0.1236637 0.3835610  472.3679 1.003900
## Omega[4,1]  0.39399921 0.5301021 0.7307585  430.5447 1.003095
## Omega[4,2]  0.09601608 0.2308097 0.4930113  581.4990 1.000440
## Omega[4,3] -0.02271670 0.1236637 0.3835610  472.3679 1.003900
## Omega[4,4]  1.00000000 1.0000000 1.0000000 1772.1159 0.997998

Visualize parameters (1)

plot(test.stan, pars=c("Theta", "tau", "Omega"))

Visualize parameters (2)

plot(test.stan, pars=c("Theta", "Sigma"))

Prepare the chocolate cbc data

Choice-based conjoint data for chocolate bars

Fourteen respondents each answered 25 choice tasks where they selected from among three chocolate bars.
Three attributes
- Brand: Hersheys, Dove, Lindt, Godiva, Ghirardelli
- Type: Milk, Milk with nuts, Dark, Dark with nuts, White

Tidy up and read in the chocolate data

rm(list=ls()) # tidy up
choc.df <- read.csv("cbc_chocolate.csv")

Coding the chocolate data

We've done this twice now, so we ought to package it up as a function (but we haven't).

choc.contrasts <- list(Brand = "contr.sum", Type = "contr.sum")
choc.coded <- model.matrix(~ Brand + Type, data = choc.df, 
                           contrasts = choc.contrasts)
choc.coded <- choc.coded[,2:ncol(choc.coded)] # remove intercept
# Fix the bad labels from contr.sum
choc.names <- c("BrandDove", "BrandGhirardelli", "BrandGodiva", 
                "BrandHersheys", "TypeDark", "TypeDarkNuts", 
                "TypeMilk", "TypeMilkNuts")
colnames(choc.coded) <- choc.names
choc.df <- cbind(choc.df, choc.coded)

Coded chocolate data

head(choc.df)

##   X  Ind Trial Alt    Brand        Type Price Chosen BrandDove
## 1 1 2401     1   1     Dove        Milk   0.6      1         1
## 2 2 2401     1   2   Godiva        Dark   0.7      0         0
## 3 3 2401     1   3     Dove       White   3.6      0         1
## 4 4 2401     2   1   Godiva Milk w Nuts   2.7      0         0
## 5 5 2401     2   2   Godiva        Dark   3.9      1         0
## 6 6 2401     2   3 Hersheys Milk w Nuts   0.7      0         0
##   BrandGhirardelli BrandGodiva BrandHersheys TypeDark TypeDarkNuts
## 1                0           0             0        0            0
## 2                0           1             0        1            0
## 3                0           0             0       -1           -1
## 4                0           1             0        0            0
## 5                0           1             0        1            0
## 6                0           0             1        0            0
##   TypeMilk TypeMilkNuts
## 1        1            0
## 2        0            0
## 3       -1           -1
## 4        0            1
## 5        0            0
## 6        0            1

Munge the data into Stan list format

R <- length(unique(choc.df$Ind))
S <- length(unique(choc.df$Trial))
C <- max(choc.df$Alt)
K <- 9
Y <- array(dim=c(R, S))
X <- array(rnorm(R*S*C*K), dim=c(R, S, C, K)) 
Z <- array(1, dim=c(1, R)) # intercept only
for (r in 1:R) { # respondents
  for (s in 1:S){ # choice scenarios
    scenario <- choc.df[choc.df$Ind==unique(choc.df$Ind)[r] & 
                        choc.df$Trial==unique(choc.df$Trial)[s], ]
    X[r,s,,] <- data.matrix(scenario[,c(7, 9:16)]) 
    Y[r,s] <- scenario$Alt[as.logical(scenario$Chosen)]
  }
}
choc.standata <- list(C=C, K=K, R=R, S=S, G=1, Y=Y, X=X, Z=Z)

Check the chocolate data

str(choc.standata)

## List of 8
##  $ C: int 3
##  $ K: num 9
##  $ R: int 14
##  $ S: int 25
##  $ G: num 1
##  $ Y: int [1:14, 1:25] 1 2 2 2 3 1 2 2 2 3 ...
##  $ X: num [1:14, 1:25, 1:3, 1:9] 0.6 2.2 1.4 3.9 3 3.8 0.6 1.4 2.2 1.8 ...
##  $ Z: num [1, 1:14] 1 1 1 1 1 1 1 1 1 1 ...

Tidy up

rm(Y, X, Z, R, S, r, s, C, K, choc.contrasts, scenario, choc.coded)

Run the hmnl model with the chocolate data

Bayesian inference

Call the `stan()` function

choc.stan <- stan(file="hmnl.stan", data=choc.standata)

## Warning: There were 288 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: Examine the pairs() plot to diagnose sampling problems

Assess convergence with tracelots (1)

plot(choc.stan, plotfun="trace", pars=("Theta"))

Assess convergence with tracelots (1)

plot(choc.stan, plotfun="trace", pars=c("tau"))

Assess convergence with tracelots (1)

plot(choc.stan, plotfun="trace", pars=c("Omega[1,2]"))

Assess convergence with tracelots (1)

plot(choc.stan, plotfun="trace", pars=paste("Beta[", 1:9, ",1]", sep="")) # resp 1

Assess convergence with `Rhat` and `n_eff`

summary(choc.stan)$summary[,c("Rhat", "n_eff")]

##                 Rhat     n_eff
## Beta[1,1]  1.0180027  340.4515
## Beta[1,2]  1.0035856 1437.3292
## Beta[1,3]  1.0036588  848.9170
## Beta[1,4]  1.0012271  644.1746
## Beta[1,5]  1.0109615  546.0170
## Beta[1,6]  1.0013741 4000.0000
## Beta[1,7]  1.0044119  904.9123
## Beta[1,8]  1.0036033 1151.8533
## Beta[1,9]  1.0122041  512.1820
## Beta[1,10] 1.0112701  657.7029
## Beta[1,11] 1.0029220  353.5995
## Beta[1,12] 1.0010631 2691.1345
## Beta[1,13] 1.0002786 4000.0000
## Beta[1,14] 1.0125499  489.9039
## Beta[2,1]  1.0049358  782.8245
## Beta[2,2]  1.0103333  594.6199
## Beta[2,3]  1.0069460  426.2741
## Beta[2,4]  1.0056037  757.3925
## Beta[2,5]  1.0043620  945.6663
## Beta[2,6]  1.0151676  240.4628
## Beta[2,7]  1.0025762  860.3090
## Beta[2,8]  1.0033993 1301.6805
## Beta[2,9]  1.0044339  523.5439
## Beta[2,10] 1.0061387  710.4411
## Beta[2,11] 1.0174213  195.2890
## Beta[2,12] 1.0058873  461.5969
## Beta[2,13] 1.0116327  703.6454
## Beta[2,14] 1.0041029 1165.6833
## Beta[3,1]  1.0021335 1490.4266
## Beta[3,2]  1.0061187  884.1742
## Beta[3,3]  1.0023571 1071.7164
## Beta[3,4]  1.0052796  508.6472
## Beta[3,5]  1.0035415 1174.9756
## Beta[3,6]  1.0203427  283.0091
## Beta[3,7]  1.0025668 1334.7600
## Beta[3,8]  1.0058139  709.7439
## Beta[3,9]  1.0019619  690.0386
## Beta[3,10] 1.0171816  404.8059
## Beta[3,11] 1.0011884 1349.5514
## Beta[3,12] 1.0032888  731.0606
## Beta[3,13] 1.0025069 1975.5535
## Beta[3,14] 1.0047454 1160.1927
## Beta[4,1]  1.0158063  304.6942
## Beta[4,2]  1.0107297  667.5327
## Beta[4,3]  1.0017727 1937.8706
## Beta[4,4]  1.0016271 2633.7309
## Beta[4,5]  1.0065082  449.2244
## Beta[4,6]  1.0044752 1785.2911
## Beta[4,7]  1.0047472 1494.3903
## Beta[4,8]  1.0010326  790.6136
## Beta[4,9]  1.0131298  585.9340
## Beta[4,10] 1.0070649  410.9575
## Beta[4,11] 1.0028083  804.6341
## Beta[4,12] 1.0164423  378.0546
## Beta[4,13] 1.0131024  330.0931
## Beta[4,14] 1.0022841 1308.2744
## Beta[5,1]  1.0033101 1585.8486
## Beta[5,2]  1.0061125  624.4482
## Beta[5,3]  1.0032506 1804.5008
## Beta[5,4]  1.0044805  761.4265
## Beta[5,5]  1.0028868  938.3792
## Beta[5,6]  1.0101626  456.9870
## Beta[5,7]  1.0158835  333.6554
## Beta[5,8]  1.0010972 1097.1930
## Beta[5,9]  1.0001976 1620.2600
## Beta[5,10] 1.0141778  805.7094
## Beta[5,11] 1.0065720  615.8393
## Beta[5,12] 1.0135749  536.1551
## Beta[5,13] 1.0072512  782.7990
## Beta[5,14] 1.0032639 1235.7393
## Beta[6,1]  1.0077925 1124.2004
## Beta[6,2]  1.0020983 1509.4346
## Beta[6,3]  1.0052612 1766.4544
## Beta[6,4]  1.0044736 1431.8930
## Beta[6,5]  1.0113078  726.6343
## Beta[6,6]  1.0010567 1641.2374
## Beta[6,7]  1.0019441  837.4156
## Beta[6,8]  1.0071274  481.6240
## Beta[6,9]  1.0005410 2128.7788
## Beta[6,10] 1.0075147  676.6293
## Beta[6,11] 1.0045498  445.2819
## Beta[6,12] 1.0007481 1491.9317
## Beta[6,13] 1.0029605  958.7516
## Beta[6,14] 1.0003966 4000.0000
## Beta[7,1]  1.0008051 2183.4671
## Beta[7,2]  1.0033444 1410.1431
## Beta[7,3]  1.0016241 4000.0000
## Beta[7,4]  1.0063050  920.3628
## Beta[7,5]  1.0004301 2153.2075
## Beta[7,6]  1.0133428  286.7124
## Beta[7,7]  1.0046660  840.5705
## Beta[7,8]  1.0051145  956.4836
## Beta[7,9]  1.0074961  756.7495
## Beta[7,10] 1.0043291  798.0901
## Beta[7,11] 1.0105840  405.3669
## Beta[7,12] 1.0059636  948.7875
## Beta[7,13] 1.0017515 1049.8647
## Beta[7,14] 1.0062459  631.2027
## Beta[8,1]  1.0167282  363.5856
## Beta[8,2]  1.0055145 1117.8747
## Beta[8,3]  1.0041883  491.7873
## Beta[8,4]  1.0055475  983.7813
## Beta[8,5]  1.0015098 1554.1905
## Beta[8,6]  1.0021618  987.7929
## Beta[8,7]  1.0059533  935.1030
## Beta[8,8]  1.0048981 1118.3021
## Beta[8,9]  1.0029990  998.5835
## Beta[8,10] 1.0083178  554.2532
## Beta[8,11] 1.0036849 1583.6015
## Beta[8,12] 1.0067867  876.9910
## Beta[8,13] 1.0064475  653.5814
## Beta[8,14] 1.0035807 1083.9708
## Beta[9,1]  1.0088868  962.0939
## Beta[9,2]  1.0099335  673.8428
## Beta[9,3]  1.0019676 2037.0774
## Beta[9,4]  1.0012478  975.9475
## Beta[9,5]  1.0083499  857.9608
## Beta[9,6]  1.0148836  610.3351
## Beta[9,7]  1.0069363 1078.6496
## Beta[9,8]  1.0027502 4000.0000
## Beta[9,9]  1.0035084  831.3470
## Beta[9,10] 1.0025444 1719.2735
## Beta[9,11] 1.0027295  878.8618
## Beta[9,12] 1.0085528  796.2931
## Beta[9,13] 1.0061991  670.7641
## Beta[9,14] 1.0052061  791.4334
## Theta[1,1] 1.0011451  471.1384
## Theta[2,1] 1.0021987  962.1277
## Theta[3,1] 1.0077890  712.3733
## Theta[4,1] 1.0029579 1742.0184
## Theta[5,1] 1.0047466  955.7515
## Theta[6,1] 1.0005747 1721.8434
## Theta[7,1] 1.0012750  869.6398
## Theta[8,1] 1.0010420 1600.0472
## Theta[9,1] 1.0005807 1730.0040
## Omega[1,1]       NaN 4000.0000
## Omega[1,2] 1.0011829 1179.7386
## Omega[1,3] 0.9998514 1989.1959
## Omega[1,4] 1.0003198  564.4148
## Omega[1,5] 1.0031879  466.4396
## Omega[1,6] 1.0023970  845.6668
## Omega[1,7] 1.0072107  399.7610
## Omega[1,8] 1.0095541 1531.0718
## Omega[1,9] 1.0048362 1530.3522
## Omega[2,1] 1.0011829 1179.7386
## Omega[2,2] 0.9989995 2847.1639
## Omega[2,3] 1.0046790  475.2309
## Omega[2,4] 1.0024078  801.9186
## Omega[2,5] 1.0004331  924.2753
## Omega[2,6] 1.0019529  700.1241
## Omega[2,7] 1.0015978  396.1103
## Omega[2,8] 1.0026074  943.5431
## Omega[2,9] 1.0046692  503.5494
## Omega[3,1] 0.9998514 1989.1959
## Omega[3,2] 1.0046790  475.2309
## Omega[3,3] 0.9989995 3242.8914
## Omega[3,4] 1.0016020 1096.2002
## Omega[3,5] 1.0030137 1203.7040
## Omega[3,6] 0.9992707 2339.2548
## Omega[3,7] 1.0001942 1766.8478
## Omega[3,8] 1.0011678  972.6156
## Omega[3,9] 1.0019435 1666.1425
## Omega[4,1] 1.0003198  564.4148
## Omega[4,2] 1.0024078  801.9186
## Omega[4,3] 1.0016020 1096.2002
## Omega[4,4] 0.9989995 3563.0485
## Omega[4,5] 1.0040009 1134.7141
## Omega[4,6] 0.9999094 1333.2139
## Omega[4,7] 1.0040652 1241.5152
## Omega[4,8] 1.0008644 1629.5733
## Omega[4,9] 1.0082580  385.9536
## Omega[5,1] 1.0031879  466.4396
## Omega[5,2] 1.0004331  924.2753
## Omega[5,3] 1.0030137 1203.7040
## Omega[5,4] 1.0040009 1134.7141
## Omega[5,5] 0.9989995 1731.9724
## Omega[5,6] 1.0002329 1411.1406
## Omega[5,7] 1.0098318  865.1011
## Omega[5,8] 1.0032843 1702.7851
## Omega[5,9] 1.0154759  487.1010
## Omega[6,1] 1.0023970  845.6668
## Omega[6,2] 1.0019529  700.1241
## Omega[6,3] 0.9992707 2339.2548
## Omega[6,4] 0.9999094 1333.2139
## Omega[6,5] 1.0002329 1411.1406
## Omega[6,6] 0.9989995 2315.9215
## Omega[6,7] 1.0022905 1374.2751
## Omega[6,8] 1.0003105 2245.2638
## Omega[6,9] 1.0019029 1745.9138
## Omega[7,1] 1.0072107  399.7610
## Omega[7,2] 1.0015978  396.1103
## Omega[7,3] 1.0001942 1766.8478
## Omega[7,4] 1.0040652 1241.5152
## Omega[7,5] 1.0098318  865.1011
## Omega[7,6] 1.0022905 1374.2751
## Omega[7,7] 0.9989995 2372.6254
## Omega[7,8] 1.0000462 2102.3003
## Omega[7,9] 1.0026384 1182.5633
## Omega[8,1] 1.0095541 1531.0718
## Omega[8,2] 1.0026074  943.5431
## Omega[8,3] 1.0011678  972.6156
## Omega[8,4] 1.0008644 1629.5733
## Omega[8,5] 1.0032843 1702.7851
## Omega[8,6] 1.0003105 2245.2638
## Omega[8,7] 1.0000462 2102.3003
## Omega[8,8] 0.9989995 3011.1776
## Omega[8,9] 1.0053005  950.9492
## Omega[9,1] 1.0048362 1530.3522
## Omega[9,2] 1.0046692  503.5494
## Omega[9,3] 1.0019435 1666.1425
## Omega[9,4] 1.0082580  385.9536
## Omega[9,5] 1.0154759  487.1010
## Omega[9,6] 1.0019029 1745.9138
## Omega[9,7] 1.0026384 1182.5633
## Omega[9,8] 1.0053005  950.9492
## Omega[9,9] 0.9989995 1771.3553
## tau[1]     1.0013541  664.6025
## tau[2]     1.0071464  307.2207
## tau[3]     1.0109629  382.4852
## tau[4]     1.0306643  159.4508
## tau[5]     1.0062021  513.0946
## tau[6]     1.0043754  608.8666
## tau[7]     1.0079876  656.8926
## tau[8]     1.0030341  969.9705
## tau[9]     1.0051231  824.0610
## Sigma[1,1] 1.0007235  734.4629
## Sigma[1,2] 1.0004606 2458.3194
## Sigma[1,3] 0.9997132 1696.4911
## Sigma[1,4] 1.0065194  550.6392
## Sigma[1,5] 1.0041844  904.2019
## Sigma[1,6] 1.0007593 1483.7961
## Sigma[1,7] 1.0019730  377.3423
## Sigma[1,8] 1.0089500 1033.2369
## Sigma[1,9] 1.0022818 1201.6203
## Sigma[2,1] 1.0004606 2458.3194
## Sigma[2,2] 1.0047737  410.2667
## Sigma[2,3] 1.0044316  570.0026
## Sigma[2,4] 0.9998893 1982.8026
## Sigma[2,5] 1.0024428 1019.3557
## Sigma[2,6] 1.0082884  548.8212
## Sigma[2,7] 1.0054626  301.1228
## Sigma[2,8] 1.0011669  901.3660
## Sigma[2,9] 1.0007613 1130.2706
## Sigma[3,1] 0.9997132 1696.4911
## Sigma[3,2] 1.0044316  570.0026
## Sigma[3,3] 1.0087689  839.5521
## Sigma[3,4] 1.0035291 1461.0268
## Sigma[3,5] 1.0019209 1758.8501
## Sigma[3,6] 1.0007057 1817.6524
## Sigma[3,7] 1.0007671 1981.6982
## Sigma[3,8] 1.0010345 1304.6095
## Sigma[3,9] 1.0005843 1580.7640
## Sigma[4,1] 1.0065194  550.6392
## Sigma[4,2] 0.9998893 1982.8026
## Sigma[4,3] 1.0035291 1461.0268
## Sigma[4,4] 1.0198292  260.1514
## Sigma[4,5] 1.0009275 1741.8597
## Sigma[4,6] 1.0003244 1293.3671
## Sigma[4,7] 1.0033913 2365.5348
## Sigma[4,8] 1.0007399 2683.4368
## Sigma[4,9] 1.0012163 1436.2984
## Sigma[5,1] 1.0041844  904.2019
## Sigma[5,2] 1.0024428 1019.3557
## Sigma[5,3] 1.0019209 1758.8501
## Sigma[5,4] 1.0009275 1741.8597
## Sigma[5,5] 1.0049985  745.3676
## Sigma[5,6] 1.0019678 1166.3449
## Sigma[5,7] 1.0030940 1830.5922
## Sigma[5,8] 1.0010770 1928.6775
## Sigma[5,9] 1.0051577 1112.0139
## Sigma[6,1] 1.0007593 1483.7961
## Sigma[6,2] 1.0082884  548.8212
## Sigma[6,3] 1.0007057 1817.6524
## Sigma[6,4] 1.0003244 1293.3671
## Sigma[6,5] 1.0019678 1166.3449
## Sigma[6,6] 1.0039011  655.5242
## Sigma[6,7] 1.0045672  841.8437
## Sigma[6,8] 0.9999274 2210.3736
## Sigma[6,9] 1.0009354 1782.8962
## Sigma[7,1] 1.0019730  377.3423
## Sigma[7,2] 1.0054626  301.1228
## Sigma[7,3] 1.0007671 1981.6982
## Sigma[7,4] 1.0033913 2365.5348
## Sigma[7,5] 1.0030940 1830.5922
## Sigma[7,6] 1.0045672  841.8437
## Sigma[7,7] 1.0062761  691.1112
## Sigma[7,8] 1.0010960 1039.7224
## Sigma[7,9] 1.0011739 1342.4502
## Sigma[8,1] 1.0089500 1033.2369
## Sigma[8,2] 1.0011669  901.3660
## Sigma[8,3] 1.0010345 1304.6095
## Sigma[8,4] 1.0007399 2683.4368
## Sigma[8,5] 1.0010770 1928.6775
## Sigma[8,6] 0.9999274 2210.3736
## Sigma[8,7] 1.0010960 1039.7224
## Sigma[8,8] 1.0021606 1113.2923
## Sigma[8,9] 1.0032240 1319.7995
## Sigma[9,1] 1.0022818 1201.6203
## Sigma[9,2] 1.0007613 1130.2706
## Sigma[9,3] 1.0005843 1580.7640
## Sigma[9,4] 1.0012163 1436.2984
## Sigma[9,5] 1.0051577 1112.0139
## Sigma[9,6] 1.0009354 1782.8962
## Sigma[9,7] 1.0011739 1342.4502
## Sigma[9,8] 1.0032240 1319.7995
## Sigma[9,9] 1.0042405  907.3221
## lp__       1.0168413  200.4778

Function for convergence checking

In a hierarchical model with many parameters, manually checking for convergence is pretty tedious. So, we can write a function to automate that for us.

check_fit <- function(fit) {
  summ <- summary(fit)$summary
  range_rhat <- range(summ[ , 'Rhat'])
  rhat_ok <- 0.99 <= range_rhat[1] && range_rhat[2] <= 1.1
  range_neff <- range(summ[ , 'n_eff'])
  neff_ok <- range_neff[1] >= 400
  sp <- rstan::get_sampler_params(fit, inc_warmup=FALSE)
  max_divergent <- max(sapply(sp, function(p){ sum(p[ , 'divergent__']) }))
  no_divergent <- max_divergent == 0
  
  list(ok = rhat_ok && neff_ok && no_divergent,
       range_rhat = range_rhat,
       range_neff = range_neff,
       max_divergent = max_divergent)
}

Checking convergence

Our model isn't converging well :(
This is likely due to data limitations; we have just 14 respondents and 9 attributes.

check_fit(choc.stan)

## $ok
## [1] FALSE
## 
## $range_rhat
## [1] NaN NaN
## 
## $range_neff
## [1]  159.4508 4000.0000
## 
## $max_divergent
## [1] 139

Options for addressing convergence problems

Consider whether there are any pathologies in the model specification
- unlikely in this case
Run the algorithm for longer
Get more data
Tighten up the priors

We are going to move on to use these posterior draws although we should probably not.

Summarizing the posterior distribution of the parameters

Attributes

Remember, the attributes are:

c("Price", choc.names)

## [1] "Price"            "BrandDove"        "BrandGhirardelli"
## [4] "BrandGodiva"      "BrandHersheys"    "TypeDark"        
## [7] "TypeDarkNuts"     "TypeMilk"         "TypeMilkNuts"

Visualize the parameter estimates (1)

plot(choc.stan, pars=c("Theta", "tau"))

It is a shame that you can't give descriptive names to the attributes in Stan.

Visualize the parameter estimates (2)

plot(choc.stan, pars=c("Omega"))

Visualize the parameter estimates (3)

plot(choc.stan, pars=paste("Beta[", 1:9, ",1]", sep="")) + 
  ggtitle("Respondent 1: Likes Milk Chocolate")

Visualize the parameter estimates (4)

plot(choc.stan, pars=paste("Beta[", 1:9, ",2]", sep="")) + 
  ggtitle("Respondent 2: Likes Dark Chocolate")

Simulating shares

Simulating shares from the hmnl

One way to simulate shares from a hierarchical model is to use the draws of the respondent-level parameters (beta).

shares.hmnl.post <- function(beta.draws, X) # X is attribute matrix
{
  R <- dim(beta.draws)[3]  # respondents
  D <- dim(beta.draws)[1]  # draws
  shares <- array(NA, dim=c(nrow(X), R, D))
  for (d in 1:D) {
    for (r in 1:R) {
      beta <- beta.draws[d,,r] 
      V <- exp(X %*% beta)  
      shares[,r,d] <- V/sum(V)
    }
  }
  shares
}

New choice set

choc.standata$X[1,1,,]

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]  0.6    1    0    0    0    0    0    1    0
## [2,]  0.7    0    0    1    0    1    0    0    0
## [3,]  3.6    1    0    0    0   -1   -1   -1   -1

rbind(c("Price", choc.names), choc.standata$X[1,1,,])

##      [,1]    [,2]        [,3]               [,4]          [,5]           
## [1,] "Price" "BrandDove" "BrandGhirardelli" "BrandGodiva" "BrandHersheys"
## [2,] "0.6"   "1"         "0"                "0"           "0"            
## [3,] "0.7"   "0"         "0"                "1"           "0"            
## [4,] "3.6"   "1"         "0"                "0"           "0"            
##      [,6]       [,7]           [,8]       [,9]          
## [1,] "TypeDark" "TypeDarkNuts" "TypeMilk" "TypeMilkNuts"
## [2,] "0"        "0"            "1"        "0"           
## [3,] "1"        "0"            "0"        "0"           
## [4,] "-1"       "-1"           "-1"       "-1"

Simulated shares for each respondent

shares <- shares.hmnl.post(extract(choc.stan, pars=c("Beta"))$Beta, 
                          choc.standata$X[1,1,,])
str(shares)

##  num [1:3, 1:14, 1:4000] 9.35e-01 6.51e-02 1.39e-05 4.59e-03 9.92e-01 ...

Summary of simulated shares

Our uncertainty (with only 14 respondents) is reflected in our share predictions

apply(shares, 1, quantile, probs=c(0.5, 0.025, 0.975))

##               [,1]         [,2]         [,3]
## 50%   3.412695e-01 0.6495468590 5.721589e-05
## 2.5%  3.396228e-05 0.0006379816 1.076268e-11
## 97.5% 9.988754e-01 0.9998997247 7.333006e-02

Share predictions for indivdiuals

apply(shares, 1:2, quantile, probs=c(0.5, 0.025, 0.975))

## , , 1
## 
##             [,1]       [,2]         [,3]
## 50%   0.52921322 0.46997694 2.479857e-05
## 2.5%  0.07293555 0.05634113 5.860249e-11
## 97.5% 0.94364683 0.92706443 1.047600e-02
## 
## , , 2
## 
##               [,1]      [,2]         [,3]
## 50%   2.419729e-03 0.9970639 2.242051e-06
## 2.5%  1.008049e-05 0.9099301 5.235008e-12
## 97.5% 8.116639e-02 0.9999879 9.369602e-03
## 
## , , 3
## 
##            [,1]         [,2]         [,3]
## 50%   0.9473477 0.0524937301 3.419907e-06
## 2.5%  0.5280445 0.0008431497 1.090029e-10
## 97.5% 0.9991560 0.4719477186 8.461009e-04
## 
## , , 4
## 
##             [,1]        [,2]         [,3]
## 50%   0.71299471 0.285669693 2.534933e-06
## 2.5%  0.05874182 0.008394166 1.832109e-12
## 97.5% 0.99144790 0.941255420 7.523425e-03
## 
## , , 5
## 
##              [,1]      [,2]         [,3]
## 50%   0.078929457 0.9203780 1.215267e-04
## 2.5%  0.001945131 0.3393470 2.754301e-07
## 97.5% 0.660575255 0.9980223 7.426643e-03
## 
## , , 6
## 
##             [,1]        [,2]         [,3]
## 50%   0.78924182 0.204666871 6.060383e-04
## 2.5%  0.06484919 0.001010318 6.396070e-07
## 97.5% 0.99894612 0.933725644 3.506047e-02
## 
## , , 7
## 
##               [,1]      [,2]         [,3]
## 50%   3.383483e-03 0.9926767 1.693234e-03
## 2.5%  8.554377e-06 0.7546423 1.211685e-06
## 97.5% 1.384058e-01 0.9999809 1.642851e-01
## 
## , , 8
## 
##            [,1]         [,2]         [,3]
## 50%   0.9827591 0.0117115098 1.406999e-03
## 2.5%  0.7762205 0.0001263224 6.760477e-06
## 97.5% 0.9997307 0.1962198517 5.357776e-02
## 
## , , 9
## 
##            [,1]         [,2]         [,3]
## 50%   0.9881849 1.180817e-02 4.377850e-09
## 2.5%  0.4788697 2.887136e-05 1.737211e-15
## 97.5% 0.9999711 5.211000e-01 4.663053e-05
## 
## , , 10
## 
##               [,1]      [,2]         [,3]
## 50%   3.924797e-03 0.9959666 1.309655e-06
## 2.5%  6.689007e-06 0.7909290 1.352694e-13
## 97.5% 2.080483e-01 0.9999897 3.113427e-03
## 
## , , 11
## 
##            [,1]         [,2]         [,3]
## 50%   0.9346297 2.311662e-02 0.0205003051
## 2.5%  0.3492565 9.827132e-05 0.0006578379
## 97.5% 0.9978548 4.252427e-01 0.4302716098
## 
## , , 12
## 
##               [,1]      [,2]         [,3]
## 50%   4.077937e-04 0.9965565 1.657983e-03
## 2.5%  4.747438e-07 0.8483409 2.693285e-06
## 97.5% 3.244748e-02 0.9999852 1.281857e-01
## 
## , , 13
## 
##              [,1]      [,2]         [,3]
## 50%   0.061001626 0.9377999 2.410696e-05
## 2.5%  0.002433946 0.3770467 9.347843e-11
## 97.5% 0.621924855 0.9974844 1.425409e-02
## 
## , , 14
## 
##             [,1]      [,2]         [,3]
## 50%   0.27137038 0.7285966 3.360958e-06
## 2.5%  0.02405493 0.1928012 2.512177e-09
## 97.5% 0.80719106 0.9759450 5.909860e-04

Summary of Module 3

In this module we have:

Built a hierarchical multnomial model in Stan.
- multivariate normal distribution
- correlation and covariance matrices
- LKJ prior
- transformed parameters
Tested it with synthetic data
Estimated it using the chocolate cbc data
Learned how to automate the process of checking convergence
Simulated shares from a hierarchical model

Next in Module 4

The next module will illustrate the real power of Stan, which is to flexibly create new models.

Module objective

More Stan syntax

Hierarchical models

Multivariate normal distribution

Covariance matricies

Other parameterizations

Other multivariate distributions

Priors for covariances

LKJ Prior for correlations

Visualizing the LKJ prior

Using LKJ to put a prior on covariance

Truncated distributions

Assignment

Hierarchical multnomial logit model in Stan

Data

Choice sets of varying size

Parameters

Transformed parameters

Model

Bayesian inference

Get setup in R

Testing the hierarchical multinomial logit model with synthetic data

Synthetic data generation function (1)

Synthetic data generation function (2)

Synthetic data

Running the hmnl on synthetic data

Traceplots: Theta

Traceplots: tau

Traceplots: Omega

Summary: Theta (Population means of part-worths)

Summary: tau (Population variances of part-worths)

Summary: Omega (Populaltion correlations of part-worths)

Visualize parameters (1)

Visualize parameters (2)

Prepare the chocolate cbc data

Choice-based conjoint data for chocolate bars

Tidy up and read in the chocolate data

Coding the chocolate data

Coded chocolate data

Munge the data into Stan list format

Check the chocolate data

Tidy up

Run the hmnl model with the chocolate data

Bayesian inference

Call the stan() function

Assess convergence with tracelots (1)

Assess convergence with tracelots (1)

Assess convergence with tracelots (1)

Assess convergence with tracelots (1)

Assess convergence with Rhat and n_eff

Function for convergence checking

Checking convergence

Options for addressing convergence problems

Summarizing the posterior distribution of the parameters

Attributes

Visualize the parameter estimates (1)

Visualize the parameter estimates (2)

Visualize the parameter estimates (3)

Visualize the parameter estimates (4)

More posterior checking in ShinyStan

Simulating shares

Simulating shares from the hmnl

New choice set

Simulated shares for each respondent

Summary of simulated shares

Share predictions for indivdiuals

Summary of Module 3

In this module we have:

Next in Module 4

Call the `stan()` function

Assess convergence with `Rhat` and `n_eff`