Preamble

Package

require(RSiena)
## Loading required package: RSiena
require(multiSiena)
## Loading required package: multiSiena

Load mock dataset

Use a routine for creating synthetic dataset with 6 networks

source("https://raw.githubusercontent.com/johankoskinen/Sunbelt2024/main/multiSiena/data.set.enzo.setup.R")

This dataset, enzo,

enzo <- create.enzo()

is a sienaGroup object.

What data

Plot the 6 networks for the two waves. For each group \(j=1,\ldots,6\), we have adjacency matrices \[ \boldsymbol{x}^{(j)}(t_0), \text{ and } \boldsymbol{x}^{(j)}(t_1) \] For example, for group \(j=1\), \(\boldsymbol{x}^{(j)}(t_0)\)

enzo$Data1$depvars[[1]][,,1]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    1    0    0    0
## [2,]    1    0    0    0    0
## [3,]    0    0    0    1    0
## [4,]    0    0    0    0    1
## [5,]    0    0    0    0    0

and \(\boldsymbol{x}^{(j)}(t_1)\)

enzo$Data1$depvars[[1]][,,2]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    1    0    0    0
## [2,]    1    0    0    0    0
## [3,]    0    0    0    0    1
## [4,]    0    0    1    0    1
## [5,]    0    0    0    1    0

For

require(sna)
coord <- matrix(c(1,1,
1,2,
2,2,
2,1,
.5,.5),5,2,byrow=TRUE)
# === plot
par( mfrow = c(6,2) ,oma = c(0,1,0,0) + 0.1,
          mar = c(1,0,1,1) + 0.1)
# grp1
gplot( enzo$Data1$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data1$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data1$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data1$depvars[[2]][,1,2], label=c(1:5))
# grp2
gplot( enzo$Data2$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data2$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data2$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data2$depvars[[2]][,1,2], label=c(1:5))
# grp3
gplot( enzo$Data3$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data3$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data3$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data3$depvars[[2]][,1,2], label=c(1:5))
# grp4
gplot( enzo$Data4$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data4$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data4$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data4$depvars[[2]][,1,2], label=c(1:5))
# grp5
gplot( enzo$Data5$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data5$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data5$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data5$depvars[[2]][,1,2], label=c(1:5))
# grp6
gplot( enzo$Data6$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data6$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data6$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data6$depvars[[2]][,1,2], label=c(1:5))

Basic Model

For eaxh group \(j=1,\ldots,6\) we define a stochastic actor-oriented model \[ \boldsymbol{x}^{(j)}(t_1) \mid \boldsymbol{x}^{(j)}(t_0) \thicksim SAOM(\theta_j) \] The \(SAOM(\theta)\) is determined by its objective and rate functions \[ f_i(\boldsymbol{x},\theta)\text{, and } \lambda_i(\theta) \] that are assumed to be the same for all groups \(j=1,\ldots,N\).

Define model

You define the effects of the objective function of the model in the standard way. The effects structure is obtained from getEffects()

seed <- 1234
GroupEffects <- getEffects(enzo)
GroupsModel <- sienaAlgorithmCreate(projname = 'Enzo',seed=seed)
## If you use this algorithm object, siena07 will create/use an output file Enzo.txt .

Hierarchial model

All groups have the same effects and model, \(SAOM(\theta_j)\), but some of the parameters, \(\gamma_j\), vary across groups, and some, \(\eta\), are the same for all groups.

\[ \theta_j=\begin{bmatrix}\gamma_j\\\eta\end{bmatrix} \]

Group-varying and fixed

By default

summary(GroupEffects)$random
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
summary(GroupEffects)$effectName[summary(GroupEffects)$random]
## [1] "outdegree (density)"
summary(GroupEffects)$effectName[summary(GroupEffects)$random==FALSE]
##  [1] "constant net rate (period 1)"  "constant net rate (period 3)" 
##  [3] "constant net rate (period 5)"  "constant net rate (period 7)" 
##  [5] "constant net rate (period 9)"  "constant net rate (period 11)"
##  [7] "reciprocity"                   "rate beh (period 1)"          
##  [9] "rate beh (period 3)"           "rate beh (period 5)"          
## [11] "rate beh (period 7)"           "rate beh (period 9)"          
## [13] "rate beh (period 11)"          "beh linear shape"

Model for group-varying

The model assumes that the group-varying parameters follow a multivariate normal distribution \[ \gamma_j \thicksim N(\mu,\Sigma) \]

Target of inference

We want to estimate the population mean \(\mu\) and the non-varying parameter \(\eta\).

Running default

To estimate the model with default settings

groupModel.e <- sienaBayes(GroupsModel, data = enzo,
                initgainGlobal=0.1, initgainGroupwise = 0.001,
                effects = GroupEffects,
             nrunMHBatches=40, silentstart=FALSE)

Results

What did we get?

Check objects returned

names(groupModel.e)

Non-varying

The nmain: 300 posterior draws of \(\eta\) are in ThinPosteriorEta

dim(groupModel.e$ThinPosteriorEta)
## [1] 300   2

This corresponds to reciprocity and the linear shape for behaviour

par(mfrow=c(2,3))
for (k in c(1:2)){
  post <- groupModel.e$ThinPosteriorEta[,k]
plot(ts(post))# draws in each iteration
acf(post,main='')# the autocorrelation function - how much dependence
hist(post,main='')# (empirical) posterior distribution
}

Note that these are draws from a bivariate distribution

plot(groupModel.e$ThinPosteriorEta)

We can calculate probabilites for the parameters using the values simulated from the posterior

For example, we can calculate the probability that the reciprocity parameter is positive given data as

mean(groupModel.e$ThinPosteriorEta[,1]>0)
## [1] 1

Given these 6 (synthetic) networks, the reciprocity parameter is positive with a posterior probability of 1

Population parameters

The posteriors for \(\mu\) are found in ThinPosteriorMu

dim(groupModel.e$ThinPosteriorMu)
## [1] 300   3

Corresponding to the rate for the network, density, and rate for behaviour

par(mfrow=c(3,3))
for (k in c(1:3)){
  post <- groupModel.e$ThinPosteriorMu[,k]
plot(ts(post))
acf(post,main='')
hist(post,main='')
}

Summary of results

To get a table to numerical summaries of the posteriors

summary(groupModel.e)
## Bayesian estimation.
## Prior distribution:
## 
## Mu      basic rate parameter net    2.5161 
##         outdegree (density)         0.0000 
##         rate beh period 1           0.7323 
##         
## Sigma     0.5812   0.0000  -0.1585 
##           0.0000   1.0000   0.0000 
##          -0.1585   0.0000   0.1310 
##         
## Prior Df  5 
## 
## Kappa     0.0000 
## 
## Eta   
## For the fixed parameters, constant prior.
## 
## Algorithm specifications were  nprewarm = 50 , nwarm = 50 , nmain = 250 , nrunMHBatches = 40 , nImproveMH = 100 ,
##  nSampVarying = 1 , nSampConst = 1 , mult = 5 .
## Posterior means and standard deviations are averages over the last 250 runs.
## 
## Proportion of acceptances in MCMC proposals after warming up:
## 0.19 0.20 0.23 0.24 0.26 0.31 0.25 0.25 
## 
## This should ideally be between 0.15 and 0.50.
## Note: this summary does not contain a convergence check.
## Note: the print function for sienaBayesFit objects can also use a parameter nfirst,
##       indicating the first run from which convergence is assumed.
## 
## Groups:
##  Data1   Data2   Data3   Data4   Data5   Data6   
## 
## Posterior means and standard deviations for global mean parameters
## 
## Total number of runs in the results is  300 .
## Posterior means and standard deviations are averages over 250 MCMC runs (excluding warming, after thinning).
## 
##                                          Post.      Post.     cred.    cred.   p      varying   Post.    cred.   cred.   
##                                          mean       s.d.m.    from     to                       s.d.     from    to      
## Network Dynamics 
##    1. rate constant net rate (period 1)   2.1459  ( 0.7326 )  0.8441  3.4185                                             
##    2. rate constant net rate (period 3)   2.1719  ( 0.7819 )  0.9366  3.9436                                             
##    3. rate constant net rate (period 5)   2.3410  ( 0.7160 )  1.2694  3.7795                                             
##    4. rate constant net rate (period 7)   2.3381  ( 0.7119 )  1.1655  3.7801                                             
##    5. rate constant net rate (period 9)   2.5540  ( 0.7121 )  1.2057  3.9846                                             
##    6. rate constant net rate (period 11)  2.2758  ( 0.6683 )  1.1175  3.7375                                             
##    7. eval outdegree (density)           -0.5602  ( 0.4986 ) -1.6841  0.4112  0.09       +      0.9793   0.5702  1.7998  
##    8. eval reciprocity                    1.7251  ( 0.5568 )  0.6394  2.7458  1.00       -                               
## 
## Behavior Dynamics
##    9. rate rate beh (period 1)            1.2257  ( 0.5262 )  0.4060  2.4141                                             
##   10. rate rate beh (period 3)            1.1665  ( 0.5480 )  0.2743  2.6184                                             
##   11. rate rate beh (period 5)            1.1340  ( 0.5262 )  0.3071  2.4207                                             
##   12. rate rate beh (period 7)            1.1159  ( 0.5089 )  0.3590  2.4234                                             
##   13. rate rate beh (period 9)            1.0906  ( 0.5191 )  0.2869  2.3592                                             
##   14. rate rate beh (period 11)           1.2065  ( 0.5260 )  0.3976  2.5006                                             
##   15. eval beh linear shape              -0.1760  ( 0.5260 ) -1.2722  0.8481  0.36       -                               
## 
## Posterior mean of global covariance matrix (varying parameters)
##   0.6371  -0.0077  -0.1424
##  -0.0077   0.9590   0.0054
##  -0.1424   0.0054   0.1591
## 
## Posterior standard deviations of elements of global covariance matrix
##   0.3928   0.3808   0.1483
##   0.3808   0.8117   0.1823
##   0.1483   0.1823   0.0921
## 
## For the rate parameters across all groups:
##                          Post.mean Post.sd
## basic rate parameter net   2.30241 0.55871
## rate beh period 1          1.16753 0.44662

Changing hierarchial model

In the previous model, reciprocity was assumed to have the same parameter \(\theta_{j,rec}=\eta_{rec}\), for all \(j=1,\ldots,6\). To allow the reciprocity parameter to vary across groups \(j\), set

GroupEffects <- setEffect( GroupEffects, recip,random=TRUE)
##   effectName  shortName include fix   test  initialValue parm randomEffects
## 1 reciprocity recip     TRUE    FALSE FALSE          0   0    TRUE

so that now \[ \theta_j=\begin{bmatrix} \gamma_{j,rate_x}\\ \gamma_{j,dens}\\ \gamma_{j,rec}\\ \gamma_{j,rate_z}\\ \eta_{shape,z} \end{bmatrix} \text{, and } \gamma_{j} \thicksim \mathcal{N}_{4}(\mu,\Sigma) \]

Running new model

To estimate the model with one more random effect

groupModel.e2 <- sienaBayes(GroupsModel, data = enzo,
                initgainGlobal=0.1, initgainGroupwise = 0.001,
                effects = GroupEffects,
             nrunMHBatches=40, silentstart=FALSE)

Output

Now we only have one common parameter, one \(\eta\)

plot(groupModel.e2$ThinPosteriorEta)

In the plot we see the 300 simulated values of \(\eta\) that are drawn using MCMC from the posterior distribution of \(\eta\) given data.

Given these 6 (synthetic) networks, the reciprocity parameter is positive with a posterior probability of 1

But we have \(\mu\) has four dimentions (parameters)

pairs(groupModel.e2$ThinPosteriorMu)

We can calculate the probability that the reciprocity parameter is positive given data as

mean(groupModel.e2$ThinPosteriorMu[,3]>0)
## [1] 0.9966667

Group-level analysis

For reciprocity, we saw in the previous example that \[ \Pr(\mu_{rec}>0 \mid Data) \approx 1 \] but what about the group-level parameters \(\gamma_{1,rec},\gamma_{2,rec},\ldots,\gamma_{6,rec}\) - are they always positive for all of the groups?

Group and population-level

The posterior (predictive) distributions of the \(\gamma_{j}\)’s are stored in ThinParameters as iteration by group by parameter

dim(groupModel.e2$ThinParameters)
## [1] 300   6   5

and within each group

head(groupModel.e2$ThinParameters[,1,])
##      constant net rate outdegree (density) reciprocity rate beh
## [1,]          3.084515          -1.8143328  2.38747125 1.188040
## [2,]          2.534277          -0.8095603  2.10497307 1.392641
## [3,]          3.319430          -0.5882144  0.66770252 1.857220
## [4,]          2.482389          -1.6900750  1.84137145 2.622582
## [5,]          3.122526          -1.6681452  1.25374655 1.728405
## [6,]          3.727048          -0.9129924  0.06870421 1.269398
##      beh linear shape
## [1,]        0.2569260
## [2,]       -0.2283378
## [3,]       -0.3325189
## [4,]       -0.5843580
## [5,]       -0.1605018
## [6,]        0.3287635

To illustrate how we have inference for \(\mu\) for density and reciprocity, but also for \(\gamma_j\) for density and reciprocity for each group, let us plot density against reciprocity for both levels

plot(groupModel.e2$ThinPosteriorMu[,2],
     groupModel.e2$ThinPosteriorMu[,3],
     xlim=range(groupModel.e2$ThinParameters[,,2]),
     ylim=range(groupModel.e2$ThinParameters[,,3]),
     pch=4, xlab='density',ylab='reciprocity')
grp.cols <- c('darkred','red','darkorange','orange','gold','darkgreen')
for (j in c(1:6))
{
  lines( groupModel.e2$ThinParameters[,j,2], groupModel.e2$ThinParameters[,j,3],type='p',pch=1,col=grp.cols[j])
}
abline(v=mean(groupModel.e2$ThinPosteriorMu[,2]))
abline(h=mean(groupModel.e2$ThinPosteriorMu[,3]))
abline(v=0,col='grey',lty=4)
abline(h=0,col='grey',lty=4)

For an individual parameter it is common to look at the catepilar plot

require(HDInterval)
## Loading required package: HDInterval
grp.means <- colMeans(groupModel.e2$ThinParameters[,,3])
boxplot(groupModel.e2$ThinParameters[,order(grp.means),3])
CI <- hdi(groupModel.e2$ThinPosteriorMu[,3],1-.05)
abline(h=CI[1],col='grey',lty=4)
abline(h=CI[2],col='grey',lty=4)

Prior and posterior

To get a posterior distribution for \(\mu\) and \(\eta\) (and \(\Sigma\)), we need to have prior distributions for these parameters

A prior distribution for a parameter quantifies the uncertainty that we have about a parameter before we collect and observe DATA

Example: Bernoulli graph

For a cross-section \(n=5\) network, a Bernoulli graph says that each of the \(5(5-1)/2=10\) possible ties all have a probability of \(p\) of being present, independently of each other. The parameter \(0 \leq p \leq 1\) and hence a prior for \(p\) might be \(p \thicksim Beta(\alpha,\beta)\)

alpha.p <- 2
beta.p <- 2
p <- seq(from=.0001,to =.9999, length.out = 1000)
plot( p , dbeta(p, alpha.p, beta.p) , type = 'l', yaxt='n',ylab=expression(pi(p)))

Assume that we observe the network

\[ \mathbf{X}= \begin{bmatrix} - & 1 & 0 & 0 & 0 \\ - & - & 0 & 1 & 0 \\ - & - & - & 0 & 0 \\ - & - & - & - & 0 \\ - & - & - & - & - \\ \end{bmatrix} \]

This has \(L=\sum_{i<j}x_{ij}=2\), which gives the likelihood function for \(p\)

plot(p,p^2*(1-p)^8, type = 'l', yaxt='n',ylab='L(p ; X ) ' )

The posterior distribution will be \(p \mid \mathbf{X} \thicksim Beta(2+\alpha,8+\beta)\)

plot( p , dbeta(p, alpha.p+2, beta.p+8) , type = 'l', yaxt='n',ylab=expression(pi(p ~"|"~ X)))

Try with different values of the hyperparameters \(\alpha\) and \(\beta\) in the prior for \(p\)

Even better, try Mattias Villani’s widget that illustates Bayesian inference https://observablehq.com/@mattiasvillani/bayesian-inference-for-bernoulli-iid-data?collection=@mattiasvillani/bayesian-learning[Beta-Binomial]

More and bigger

Load routies

Create two waves for \(M\) networks, all of size \(n\)

n <- 10 # you could make this larger or smaller
M <- 5 # this is very few groups and you could try to produce many

The Karen dataset has as many groups as you like

Karen <- data.set.karen.set.up(n=10,# number of nodes
                      M=5,# number of networks
                      seed=123,nbrNodes=1,nmain=20,nwarm=20)

The networks are simulated from a model with effects, that in addition to the defaults have

  • transTrip
  • simX with interaction1 = "beh"

Now, figure out what inputs you need. The sienaGroup object is Karen$my.Karen

Karen$my.Karen
## Dependent variables: 
## net : oneMode 
##  beh : behavior 
## Total number of groups: 5 
## Total number of periods: 5

For different model specifications, think of what priors you need to specify. Which ones should be random and which fixed?

With the same prior, how does increasing \(M\) affect inference?

Increasing the prior variance (\(\Sigma\)), how does that affect inference?

groupModel.e <- sienaBayes(Karen$GroupsModel, data = Karen$my.Karen,
                            initgainGlobal=0.1, initgainGroupwise = 0.001,
                            effects = Karen$effects, priorMu = Karen$priorMu, priorSigma = Karen$priorSigma,
                            priorKappa = 0.01,
                            nwarm=Karen$nwarm, nmain=Karen$nmain, nrunMHBatches=40,
                            nbrNodes=Karen$nbrNodes, silentstart=FALSE)