Introduction to the Partial Observability Generalized Bilinear Mixed Effects (P-GBME) Model

Structuring data for `pgbme`

Before discussing how to run the pgbme function, we first focus on how best to structure the inputted data (the dependent variable, dyadic variables, and nodal variables):

y: n x n unweighted, symmetric matrix (where n denotes number of actors)
Xd: n x n x p array (where p denotes the number of dyadic covariates)
Xs: n x p matrix (where p denotes the number of sender covariates)
Xr: n x p matrix (where p denotes the number of receiver covariates)

We construct a small example to highlight the structure of these objects below:

set.seed(6886)
n <- 30 # number of actors in network
pd <- 2 # number of dyadic covariates
pn <- 1 # number of nodal covariates

# simulate dependent variable
yMatrix <- matrix(rbinom(n^2, 1, .3), nrow=n, ncol=n)
diag(yMatrix) <- NA

# make y symmetric (in your application, y should already be symmetric)
yMatrix <- yMatrix + t(yMatrix)
yMatrix[yMatrix>1] <- 1

# simulate dyadic variables
xd1 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd1) <- NA
xd2 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd2) <- NA
xDyadArray <- array(0, dim = c(n, n, pd))
xDyadArray[,,1] <- xd1
xDyadArray[,,2] <- xd2

# simulate nodal variables
xNodeMatrix <- matrix(rnorm(n), ncol=pn)
xNodeSenderMatrix <- xNodeMatrix
xNodeReceiverMatrix <- xNodeMatrix

The dependent variable is contained in the yMatrix object and it is a n x n matrix where a 1 would indicate that an event took place between the unit on the row and the unit on the column. Diagonals are set to NA to indicate that a unit cannot send event to itself:

yMatrix[1:5,1:5]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA    0    1    1    1
#> [2,]    0   NA    0    1    0
#> [3,]    1    0   NA    0    1
#> [4,]    1    1    0   NA    0
#> [5,]    1    0    1    0   NA

The dyadic variables are contained in the xDyadArray object. xDyadArray is a n x n x p array that contains information about how dyads relate to each other. The diagonals in each slice of this array are also set to NA.

xDyadArray[1:5,1:5,]
#> , , 1
#> 
#>            [,1]       [,2]       [,3]       [,4]        [,5]
#> [1,]         NA  0.1523116 -1.1123836 -1.2016740 -0.70200441
#> [2,] -0.5178603         NA  1.0348699  0.3590856 -0.02558061
#> [3,]  0.8092562  0.1512673         NA -0.1561219  0.90557260
#> [4,]  0.4196337 -0.3664024  1.9186996         NA -0.54714094
#> [5,] -1.7362684 -0.2285999 -0.5518109 -1.4665681          NA
#> 
#> , , 2
#> 
#>              [,1]      [,2]       [,3]        [,4]       [,5]
#> [1,]           NA 0.9715940 -0.3245590 -0.04816048  0.3319521
#> [2,] -0.857865735        NA  0.4179843  0.62588293 -1.3913672
#> [3,]  0.005965592 0.1061243         NA  0.52389307  0.6509639
#> [4,] -1.750942307 0.3632845  1.2526212          NA  1.7379925
#> [5,] -0.721488497 0.2393857  0.7743089 -0.96662596         NA

Last, xNodeMatrix contains a matrix with one nodal covariate:

xNodeMatrix[1:5,,drop=FALSE]
#>            [,1]
#> [1,] -0.5667008
#> [2,]  0.3769853
#> [3,] -0.0151702
#> [4,] -1.1180839
#> [5,] -0.4770270

For this simple example we will use the same nodal covariates to model the likelihood that an actor sends and receives a tie. Thus we create both xNodeSenderMatrix and xNodeReceiverMatrix from xNodeMatrix. A different set of covariates, however, could be used to distinguish between what makes an actor more likely to send and receive an event.

Running `pgbme`

After having formatted the dependent variable and covariates in the format specified above there are only a few other parameters that need to be set to use pgbme:

k: Dimension of multiplicative effect
burn: Burn-in period for MCMC
NS: Number of iterations to run MCMC after burn-in period
odens: Number of iterations between saved samples

k is used to set the dimension of the multiplicative effects (U’V) in the P-GBME. In the application presented in the paper we set k equal to two. This means that the third order dependencies captured by U and V will each be of dimensions n x 2; thereby placing each unit in a two dimensional space, where actors that are more proximate to each other in that space are more likely to have an event with one another. The rest of the parameters specified above are common to Bayesian models, thus we move next to using these parameters to running the pgbme function:

# running pgbme
est <- pgbme(
    y=apply(mat.vect(yMatrix), 1, prod),
    Xd=xDyadArray,
    Xs=xNodeSenderMatrix,
    Xr=xNodeReceiverMatrix,
    k=2,
    burn=500,
    NS=1000, 
    odens=10
    )
#> Using MLE to calculate starting values 
#> MCMC sampling. Estimated time 00:00:11 
#> Progress:  
#> 10%  20%  30%  40%  50%  60%  70%  80%  90%  100%  
#> Time elapsed:  00:00:09

Evaluating results from `pgbme`

In the above, we output the estimates from the pgbme function to the est object, which is structured as a list. We can access the parameter estimates from the model in the following way:

head(est$est)
#>             ll        bd-1       bd-2 Intercept        bs1         br1
#> [1,] -1965.925  0.17712412 -0.1909078 0.9028630 -0.2655823  0.11523752
#> [2,] -1938.788 -0.04752898 -0.1779059 0.6393114 -0.3302694  0.16199713
#> [3,] -1841.471  0.04250619 -0.1268805 0.7197045 -0.2602862  0.18484560
#> [4,] -1961.932 -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,] -1902.955  0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -1898.196 -0.12799857 -0.2357254 0.6472906 -0.2303312  0.20949745
#>             s2a         sab        s2b se2 rho     s2e1      s2e2
#> [1,] 0.15846509 -0.05741316 0.12793458   1   0 1.129588 0.3383517
#> [2,] 0.10481131 -0.07271955 0.20468589   1   0 0.860502 0.3336797
#> [3,] 0.09764466 -0.01783466 0.12623327   1   0 1.210503 0.6541223
#> [4,] 0.36178706 -0.17897298 0.18620081   1   0 1.359549 0.4724748
#> [5,] 0.14940635 -0.05526356 0.08221001   1   0 2.417479 0.3804811
#> [6,] 0.10648566 -0.03091257 0.10907086   1   0 2.377103 0.4731863
#>           s2f1      s2f2
#> [1,] 0.6132000 1.4809758
#> [2,] 0.5699664 1.3013999
#> [3,] 0.3392568 0.5650447
#> [4,] 0.2702974 0.4441487
#> [5,] 0.2920651 0.6540620
#> [6,] 0.3295430 1.0930032

For applied scholars, the main parameters of interest here are:

bd-1: Parameter estimate of the effect of first dyadic variable in xDyadArray
bd-2: Parameter estimate of the effect of second variable in xDyadArray
bs1: Parameter estimate of the effect of nodal variable in xNodeSenderMatrix on the likelihood of an actor sending an event
br1: Parameter estimate of the effect of nodal variable in xNodeReceiverMatrix on the likelihood of an actor receiving an event

Below we subset est$est to these parameters of interest:

# extract beta parameters for exogenous covariates
beta <- est$est[,grepl('bd|Intercept|bs|br',colnames(est$est))]
head(beta)
#>             bd-1       bd-2 Intercept        bs1         br1
#> [1,]  0.17712412 -0.1909078 0.9028630 -0.2655823  0.11523752
#> [2,] -0.04752898 -0.1779059 0.6393114 -0.3302694  0.16199713
#> [3,]  0.04250619 -0.1268805 0.7197045 -0.2602862  0.18484560
#> [4,] -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,]  0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -0.12799857 -0.2357254 0.6472906 -0.2303312  0.20949745
dim(beta)
#> [1] 50  5

The dimensions of the beta object is 50 x 5. Fifty because we ran the model for 1,000 iterations, burned the first 500 observations, and saved only every tenth iteration. The number of columns simply corresponds to the number of dyadic, sender, and receiver parameters plus an intercept term.

Conducting posterior analysis on this object proceeds similarly to any other Bayesian model. We can extract average parameter effects and credible intervals as follows (as with any other Bayesian model, users should also evaluate convergence using simple trace plots or other tools provided in packages such as coda):

# create function to summarize MCMC chain
summChain <- function(x){
    qLo <- quantile(x, probs=.025)
    mu <- mean(x)
    qHi <- quantile(x, probs=.975)
    return( c(qLo=qLo, mu=mu, qHi=qHi) )
}

# apply function to beta object
t( apply(beta, 2, summChain) ) 
#>             qLo.2.5%           mu  qHi.97.5%
#> bd-1      -0.1137771  0.052744605 0.20207397
#> bd-2      -0.2543206 -0.107067011 0.05899708
#> Intercept  0.5426361  0.730152126 0.92849191
#> bs1       -0.2836164 -0.137992121 0.10243131
#> br1       -0.3017096  0.008515962 0.26096417

Last, we also are often interested in extracting predicted values from our estimated model. To do this we can use the calc_yhat function. The main input this function takes is the output from running pgbme:

# calculate predicted values
yhat <- calc_yhat(est)

# estimate predicted values across iterations and account for probit link
yhat <- apply(pnorm(yhat), 1, mean)

# reorganize into matrix
yhat <- matrix(yhat, sqrt(length(yhat)), sqrt(length(yhat)))
# set diagonals to NA
diag(yhat) = NA

The result from the set of procedures above is a n x n matrix that contains the predicted probabilities for an event occurring between each dyad. After having calculated yhat we can proceed to conducting performance analysis as we would for any other analysis.

# organize data into dataframe
predDF <- cbind(actual=c(yMatrix), predicted=c(yhat))

# remove diagonal NA entries
predDF <- na.omit(predDF)

# convert to dataframe
predDF <- data.frame(predDF)

library(PRROC)
aucROC=roc.curve(
    predDF$predicted[predDF$actual==1], 
    predDF$predicted[predDF$actual==0])$auc
aucPR=pr.curve(
    predDF$predicted[predDF$actual==1], 
    predDF$predicted[predDF$actual==0])$auc.integral

print( c(aucROC=aucROC, aucPR=aucPR) )
#>    aucROC     aucPR 
#> 0.8878118 0.8735875

Introduction to the Partial Observability Generalized Bilinear Mixed Effects (P-GBME) Model

Arturas Rozenas, Shahryar Minhas, and John Ahlquist

2018-06-17

Structuring data for `pgbme`

Running `pgbme`

Evaluating results from `pgbme`

Further discussion

Introduction to the Partial Observability Generalized Bilinear Mixed Effects (P-GBME) Model

Arturas Rozenas, Shahryar Minhas, and John Ahlquist

2018-06-17

Structuring data for pgbme

Running pgbme

Evaluating results from pgbme

Further discussion

Structuring data for `pgbme`

Running `pgbme`

Evaluating results from `pgbme`