In this document, we provide a brief a tutorial on how to use the P-GBME model detailed in “Modeling Asymmetric Relationships from Symmetric Networks”. The P-GBME package can be installed via devtools – in the future, we may wrap P-GBME into the AMEN framework:
library(devtools)
devtools::install_github('s7minhas/pgbmeRepl', subdir='pgbme')
library(pgbme)
pgbme
Before discussing how to run the pgbme
function, we first focus on how best to structure the inputted data (the dependent variable, dyadic variables, and nodal variables):
y
: n x n unweighted, symmetric matrix (where n denotes number of actors)Xd
: n x n x p array (where p denotes the number of dyadic covariates)Xs
: n x p matrix (where p denotes the number of sender covariates)Xr
: n x p matrix (where p denotes the number of receiver covariates)We construct a small example to highlight the structure of these objects below:
set.seed(6886)
n <- 30 # number of actors in network
pd <- 2 # number of dyadic covariates
pn <- 1 # number of nodal covariates
# simulate dependent variable
yMatrix <- matrix(rbinom(n^2, 1, .3), nrow=n, ncol=n)
diag(yMatrix) <- NA
# make y symmetric (in your application, y should already be symmetric)
yMatrix <- yMatrix + t(yMatrix)
yMatrix[yMatrix>1] <- 1
# simulate dyadic variables
xd1 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd1) <- NA
xd2 <- rmnorm(n, varcov = diag(n)) # dyad-level predictor
diag(xd2) <- NA
xDyadArray <- array(0, dim = c(n, n, pd))
xDyadArray[,,1] <- xd1
xDyadArray[,,2] <- xd2
# simulate nodal variables
xNodeMatrix <- matrix(rnorm(n), ncol=pn)
xNodeSenderMatrix <- xNodeMatrix
xNodeReceiverMatrix <- xNodeMatrix
The dependent variable is contained in the yMatrix
object and it is a n x n matrix where a 1 would indicate that an event took place between the unit on the row and the unit on the column. Diagonals are set to NA
to indicate that a unit cannot send event to itself:
yMatrix[1:5,1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 0 1 1 1
#> [2,] 0 NA 0 1 0
#> [3,] 1 0 NA 0 1
#> [4,] 1 1 0 NA 0
#> [5,] 1 0 1 0 NA
The dyadic variables are contained in the xDyadArray
object. xDyadArray
is a n x n x p array that contains information about how dyads relate to each other. The diagonals in each slice of this array are also set to NA
.
xDyadArray[1:5,1:5,]
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 0.1523116 -1.1123836 -1.2016740 -0.70200441
#> [2,] -0.5178603 NA 1.0348699 0.3590856 -0.02558061
#> [3,] 0.8092562 0.1512673 NA -0.1561219 0.90557260
#> [4,] 0.4196337 -0.3664024 1.9186996 NA -0.54714094
#> [5,] -1.7362684 -0.2285999 -0.5518109 -1.4665681 NA
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 0.9715940 -0.3245590 -0.04816048 0.3319521
#> [2,] -0.857865735 NA 0.4179843 0.62588293 -1.3913672
#> [3,] 0.005965592 0.1061243 NA 0.52389307 0.6509639
#> [4,] -1.750942307 0.3632845 1.2526212 NA 1.7379925
#> [5,] -0.721488497 0.2393857 0.7743089 -0.96662596 NA
Last, xNodeMatrix
contains a matrix with one nodal covariate:
xNodeMatrix[1:5,,drop=FALSE]
#> [,1]
#> [1,] -0.5667008
#> [2,] 0.3769853
#> [3,] -0.0151702
#> [4,] -1.1180839
#> [5,] -0.4770270
For this simple example we will use the same nodal covariates to model the likelihood that an actor sends and receives a tie. Thus we create both xNodeSenderMatrix
and xNodeReceiverMatrix
from xNodeMatrix
. A different set of covariates, however, could be used to distinguish between what makes an actor more likely to send and receive an event.
pgbme
After having formatted the dependent variable and covariates in the format specified above there are only a few other parameters that need to be set to use pgbme
:
k
: Dimension of multiplicative effectburn
: Burn-in period for MCMCNS
: Number of iterations to run MCMC after burn-in periododens
: Number of iterations between saved samplesk
is used to set the dimension of the multiplicative effects (U’V) in the P-GBME. In the application presented in the paper we set k
equal to two. This means that the third order dependencies captured by U and V will each be of dimensions n x 2; thereby placing each unit in a two dimensional space, where actors that are more proximate to each other in that space are more likely to have an event with one another. The rest of the parameters specified above are common to Bayesian models, thus we move next to using these parameters to running the pgbme
function:
# running pgbme
est <- pgbme(
y=apply(mat.vect(yMatrix), 1, prod),
Xd=xDyadArray,
Xs=xNodeSenderMatrix,
Xr=xNodeReceiverMatrix,
k=2,
burn=500,
NS=1000,
odens=10
)
#> Using MLE to calculate starting values
#> MCMC sampling. Estimated time 00:00:11
#> Progress:
#> 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#> Time elapsed: 00:00:09
pgbme
In the above, we output the estimates from the pgbme
function to the est
object, which is structured as a list. We can access the parameter estimates from the model in the following way:
head(est$est)
#> ll bd-1 bd-2 Intercept bs1 br1
#> [1,] -1965.925 0.17712412 -0.1909078 0.9028630 -0.2655823 0.11523752
#> [2,] -1938.788 -0.04752898 -0.1779059 0.6393114 -0.3302694 0.16199713
#> [3,] -1841.471 0.04250619 -0.1268805 0.7197045 -0.2602862 0.18484560
#> [4,] -1961.932 -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,] -1902.955 0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -1898.196 -0.12799857 -0.2357254 0.6472906 -0.2303312 0.20949745
#> s2a sab s2b se2 rho s2e1 s2e2
#> [1,] 0.15846509 -0.05741316 0.12793458 1 0 1.129588 0.3383517
#> [2,] 0.10481131 -0.07271955 0.20468589 1 0 0.860502 0.3336797
#> [3,] 0.09764466 -0.01783466 0.12623327 1 0 1.210503 0.6541223
#> [4,] 0.36178706 -0.17897298 0.18620081 1 0 1.359549 0.4724748
#> [5,] 0.14940635 -0.05526356 0.08221001 1 0 2.417479 0.3804811
#> [6,] 0.10648566 -0.03091257 0.10907086 1 0 2.377103 0.4731863
#> s2f1 s2f2
#> [1,] 0.6132000 1.4809758
#> [2,] 0.5699664 1.3013999
#> [3,] 0.3392568 0.5650447
#> [4,] 0.2702974 0.4441487
#> [5,] 0.2920651 0.6540620
#> [6,] 0.3295430 1.0930032
For applied scholars, the main parameters of interest here are:
bd-1
: Parameter estimate of the effect of first dyadic variable in xDyadArray
bd-2
: Parameter estimate of the effect of second variable in xDyadArray
bs1
: Parameter estimate of the effect of nodal variable in xNodeSenderMatrix
on the likelihood of an actor sending an eventbr1
: Parameter estimate of the effect of nodal variable in xNodeReceiverMatrix
on the likelihood of an actor receiving an eventBelow we subset est$est
to these parameters of interest:
# extract beta parameters for exogenous covariates
beta <- est$est[,grepl('bd|Intercept|bs|br',colnames(est$est))]
head(beta)
#> bd-1 bd-2 Intercept bs1 br1
#> [1,] 0.17712412 -0.1909078 0.9028630 -0.2655823 0.11523752
#> [2,] -0.04752898 -0.1779059 0.6393114 -0.3302694 0.16199713
#> [3,] 0.04250619 -0.1268805 0.7197045 -0.2602862 0.18484560
#> [4,] -0.02777611 -0.2792294 0.8652372 -0.1650051 -0.21412647
#> [5,] 0.07311691 -0.1419628 0.6723770 -0.1367686 -0.04753896
#> [6,] -0.12799857 -0.2357254 0.6472906 -0.2303312 0.20949745
dim(beta)
#> [1] 50 5
The dimensions of the beta
object is 50 x 5. Fifty because we ran the model for 1,000 iterations, burned the first 500 observations, and saved only every tenth iteration. The number of columns simply corresponds to the number of dyadic, sender, and receiver parameters plus an intercept term.
Conducting posterior analysis on this object proceeds similarly to any other Bayesian model. We can extract average parameter effects and credible intervals as follows (as with any other Bayesian model, users should also evaluate convergence using simple trace plots or other tools provided in packages such as coda
):
# create function to summarize MCMC chain
summChain <- function(x){
qLo <- quantile(x, probs=.025)
mu <- mean(x)
qHi <- quantile(x, probs=.975)
return( c(qLo=qLo, mu=mu, qHi=qHi) )
}
# apply function to beta object
t( apply(beta, 2, summChain) )
#> qLo.2.5% mu qHi.97.5%
#> bd-1 -0.1137771 0.052744605 0.20207397
#> bd-2 -0.2543206 -0.107067011 0.05899708
#> Intercept 0.5426361 0.730152126 0.92849191
#> bs1 -0.2836164 -0.137992121 0.10243131
#> br1 -0.3017096 0.008515962 0.26096417
Last, we also are often interested in extracting predicted values from our estimated model. To do this we can use the calc_yhat
function. The main input this function takes is the output from running pgbme
:
# calculate predicted values
yhat <- calc_yhat(est)
# estimate predicted values across iterations and account for probit link
yhat <- apply(pnorm(yhat), 1, mean)
# reorganize into matrix
yhat <- matrix(yhat, sqrt(length(yhat)), sqrt(length(yhat)))
# set diagonals to NA
diag(yhat) = NA
The result from the set of procedures above is a n x n matrix that contains the predicted probabilities for an event occurring between each dyad. After having calculated yhat
we can proceed to conducting performance analysis as we would for any other analysis.
# organize data into dataframe
predDF <- cbind(actual=c(yMatrix), predicted=c(yhat))
# remove diagonal NA entries
predDF <- na.omit(predDF)
# convert to dataframe
predDF <- data.frame(predDF)
library(PRROC)
aucROC=roc.curve(
predDF$predicted[predDF$actual==1],
predDF$predicted[predDF$actual==0])$auc
aucPR=pr.curve(
predDF$predicted[predDF$actual==1],
predDF$predicted[predDF$actual==0])$auc.integral
print( c(aucROC=aucROC, aucPR=aucPR) )
#> aucROC aucPR
#> 0.8878118 0.8735875
For further details on how to use the pgbme
package please also view our application of this model to studying the bilateral investment treaty network. Files for this analysis can be found at github/s7minhas/pgbmeRepl. In these application files, we also show how to use pgbme
to estimate a model in which there is missingness among the covariates and how to assess model performance using a cross-validation technique.