Introduction to the Partial Observability Generalized Bilinear Mixed Effects (P-GBME) Model

Arturas Rozenas, Shahryar Minhas, and John Ahlquist

2018-06-17

In this document, we provide a brief a tutorial on how to use the P-GBME model detailed in “Modeling Asymmetric Relationships from Symmetric Networks”. The P-GBME package can be installed via devtools – in the future, we may wrap P-GBME into the AMEN framework:

library(devtools)
devtools::install_github('s7minhas/pgbmeRepl', subdir='pgbme')
library(pgbme)

Structuring data for pgbme

Before discussing how to run the pgbme function, we first focus on how best to structure the inputted data (the dependent variable, dyadic variables, and nodal variables):

We construct a small example to highlight the structure of these objects below:

set.seed(6886)
n <- 30 # number of actors in network
pd <- 2 # number of dyadic covariates
pn <- 1 # number of nodal covariates

# simulate dependent variable
yMatrix <- matrix(rbinom(n^2, 1, .3), nrow=n, ncol=n)
diag(yMatrix) <- NA

# make y symmetric (in your application, y should already be symmetric)
yMatrix <- yMatrix + t(yMatrix)
yMatrix[yMatrix>1] <- 1

# simulate dyadic variables
xd1 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd1) <- NA
xd2 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd2) <- NA
xDyadArray <- array(0, dim = c(n, n, pd))
xDyadArray[,,1] <- xd1
xDyadArray[,,2] <- xd2

# simulate nodal variables
xNodeMatrix <- matrix(rnorm(n), ncol=pn)
xNodeSenderMatrix <- xNodeMatrix
xNodeReceiverMatrix <- xNodeMatrix

The dependent variable is contained in the yMatrix object and it is a n x n matrix where a 1 would indicate that an event took place between the unit on the row and the unit on the column. Diagonals are set to NA to indicate that a unit cannot send event to itself:

yMatrix[1:5,1:5]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA    0    1    1    1
#> [2,]    0   NA    0    1    0
#> [3,]    1    0   NA    0    1
#> [4,]    1    1    0   NA    0
#> [5,]    1    0    1    0   NA

The dyadic variables are contained in the xDyadArray object. xDyadArray is a n x n x p array that contains information about how dyads relate to each other. The diagonals in each slice of this array are also set to NA.

xDyadArray[1:5,1:5,]
#> , , 1
#> 
#>            [,1]       [,2]       [,3]       [,4]        [,5]
#> [1,]         NA  0.1523116 -1.1123836 -1.2016740 -0.70200441
#> [2,] -0.5178603         NA  1.0348699  0.3590856 -0.02558061
#> [3,]  0.8092562  0.1512673         NA -0.1561219  0.90557260
#> [4,]  0.4196337 -0.3664024  1.9186996         NA -0.54714094
#> [5,] -1.7362684 -0.2285999 -0.5518109 -1.4665681          NA
#> 
#> , , 2
#> 
#>              [,1]      [,2]       [,3]        [,4]       [,5]
#> [1,]           NA 0.9715940 -0.3245590 -0.04816048  0.3319521
#> [2,] -0.857865735        NA  0.4179843  0.62588293 -1.3913672
#> [3,]  0.005965592 0.1061243         NA  0.52389307  0.6509639
#> [4,] -1.750942307 0.3632845  1.2526212          NA  1.7379925
#> [5,] -0.721488497 0.2393857  0.7743089 -0.96662596         NA

Last, xNodeMatrix contains a matrix with one nodal covariate:

xNodeMatrix[1:5,,drop=FALSE]
#>            [,1]
#> [1,] -0.5667008
#> [2,]  0.3769853
#> [3,] -0.0151702
#> [4,] -1.1180839
#> [5,] -0.4770270

For this simple example we will use the same nodal covariates to model the likelihood that an actor sends and receives a tie. Thus we create both xNodeSenderMatrix and xNodeReceiverMatrix from xNodeMatrix. A different set of covariates, however, could be used to distinguish between what makes an actor more likely to send and receive an event.

Running pgbme

After having formatted the dependent variable and covariates in the format specified above there are only a few other parameters that need to be set to use pgbme:

k is used to set the dimension of the multiplicative effects (U’V) in the P-GBME. In the application presented in the paper we set k equal to two. This means that the third order dependencies captured by U and V will each be of dimensions n x 2; thereby placing each unit in a two dimensional space, where actors that are more proximate to each other in that space are more likely to have an event with one another. The rest of the parameters specified above are common to Bayesian models, thus we move next to using these parameters to running the pgbme function:

# running pgbme
est <- pgbme(
    y=apply(mat.vect(yMatrix), 1, prod),
    Xd=xDyadArray,
    Xs=xNodeSenderMatrix,
    Xr=xNodeReceiverMatrix,
    k=2,
    burn=500,
    NS=1000, 
    odens=10
    )
#> Using MLE to calculate starting values 
#> MCMC sampling. Estimated time 00:00:11 
#> Progress:  
#> 10%  20%  30%  40%  50%  60%  70%  80%  90%  100%  
#> Time elapsed:  00:00:09

Evaluating results from pgbme

In the above, we output the estimates from the pgbme function to the est object, which is structured as a list. We can access the parameter estimates from the model in the following way:

head(est$est)
#>             ll        bd-1       bd-2 Intercept        bs1         br1
#> [1,] -1965.925  0.17712412 -0.1909078 0.9028630 -0.2655823  0.11523752
#> [2,] -1938.788 -0.04752898 -0.1779059 0.6393114 -0.3302694  0.16199713
#> [3,] -1841.471  0.04250619 -0.1268805 0.7197045 -0.2602862  0.18484560
#> [4,] -1961.932 -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,] -1902.955  0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -1898.196 -0.12799857 -0.2357254 0.6472906 -0.2303312  0.20949745
#>             s2a         sab        s2b se2 rho     s2e1      s2e2
#> [1,] 0.15846509 -0.05741316 0.12793458   1   0 1.129588 0.3383517
#> [2,] 0.10481131 -0.07271955 0.20468589   1   0 0.860502 0.3336797
#> [3,] 0.09764466 -0.01783466 0.12623327   1   0 1.210503 0.6541223
#> [4,] 0.36178706 -0.17897298 0.18620081   1   0 1.359549 0.4724748
#> [5,] 0.14940635 -0.05526356 0.08221001   1   0 2.417479 0.3804811
#> [6,] 0.10648566 -0.03091257 0.10907086   1   0 2.377103 0.4731863
#>           s2f1      s2f2
#> [1,] 0.6132000 1.4809758
#> [2,] 0.5699664 1.3013999
#> [3,] 0.3392568 0.5650447
#> [4,] 0.2702974 0.4441487
#> [5,] 0.2920651 0.6540620
#> [6,] 0.3295430 1.0930032

For applied scholars, the main parameters of interest here are:

Below we subset est$est to these parameters of interest:

# extract beta parameters for exogenous covariates
beta <- est$est[,grepl('bd|Intercept|bs|br',colnames(est$est))]
head(beta)
#>             bd-1       bd-2 Intercept        bs1         br1
#> [1,]  0.17712412 -0.1909078 0.9028630 -0.2655823  0.11523752
#> [2,] -0.04752898 -0.1779059 0.6393114 -0.3302694  0.16199713
#> [3,]  0.04250619 -0.1268805 0.7197045 -0.2602862  0.18484560
#> [4,] -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,]  0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -0.12799857 -0.2357254 0.6472906 -0.2303312  0.20949745
dim(beta)
#> [1] 50  5

The dimensions of the beta object is 50 x 5. Fifty because we ran the model for 1,000 iterations, burned the first 500 observations, and saved only every tenth iteration. The number of columns simply corresponds to the number of dyadic, sender, and receiver parameters plus an intercept term.

Conducting posterior analysis on this object proceeds similarly to any other Bayesian model. We can extract average parameter effects and credible intervals as follows (as with any other Bayesian model, users should also evaluate convergence using simple trace plots or other tools provided in packages such as coda):

# create function to summarize MCMC chain
summChain <- function(x){
    qLo <- quantile(x, probs=.025)
    mu <- mean(x)
    qHi <- quantile(x, probs=.975)
    return( c(qLo=qLo, mu=mu, qHi=qHi) )
}

# apply function to beta object
t( apply(beta, 2, summChain) ) 
#>             qLo.2.5%           mu  qHi.97.5%
#> bd-1      -0.1137771  0.052744605 0.20207397
#> bd-2      -0.2543206 -0.107067011 0.05899708
#> Intercept  0.5426361  0.730152126 0.92849191
#> bs1       -0.2836164 -0.137992121 0.10243131
#> br1       -0.3017096  0.008515962 0.26096417

Last, we also are often interested in extracting predicted values from our estimated model. To do this we can use the calc_yhat function. The main input this function takes is the output from running pgbme:

# calculate predicted values
yhat <- calc_yhat(est)

# estimate predicted values across iterations and account for probit link
yhat <- apply(pnorm(yhat), 1, mean)

# reorganize into matrix
yhat <- matrix(yhat, sqrt(length(yhat)), sqrt(length(yhat)))
# set diagonals to NA
diag(yhat) = NA

The result from the set of procedures above is a n x n matrix that contains the predicted probabilities for an event occurring between each dyad. After having calculated yhat we can proceed to conducting performance analysis as we would for any other analysis.

# organize data into dataframe
predDF <- cbind(actual=c(yMatrix), predicted=c(yhat))

# remove diagonal NA entries
predDF <- na.omit(predDF)

# convert to dataframe
predDF <- data.frame(predDF)

library(PRROC)
aucROC=roc.curve(
    predDF$predicted[predDF$actual==1], 
    predDF$predicted[predDF$actual==0])$auc
aucPR=pr.curve(
    predDF$predicted[predDF$actual==1], 
    predDF$predicted[predDF$actual==0])$auc.integral

print( c(aucROC=aucROC, aucPR=aucPR) )
#>    aucROC     aucPR 
#> 0.8878118 0.8735875

Further discussion

For further details on how to use the pgbme package please also view our application of this model to studying the bilateral investment treaty network. Files for this analysis can be found at github/s7minhas/pgbmeRepl. In these application files, we also show how to use pgbme to estimate a model in which there is missingness among the covariates and how to assess model performance using a cross-validation technique.