require(RSiena)
## Loading required package: RSiena
require(multiSiena)
## Loading required package: multiSiena
Use a routine for creating synthetic dataset with 6 networks
source("https://raw.githubusercontent.com/johankoskinen/Sunbelt2024/main/multiSiena/data.set.enzo.setup.R")
This dataset, enzo,
enzo <- create.enzo()
is a sienaGroup object.
Plot the 6 networks for the two waves. For each group \(j=1,\ldots,6\), we have adjacency matrices \[ \boldsymbol{x}^{(j)}(t_0), \text{ and } \boldsymbol{x}^{(j)}(t_1) \] For example, for group \(j=1\), \(\boldsymbol{x}^{(j)}(t_0)\)
enzo$Data1$depvars[[1]][,,1]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 0 0 0
## [2,] 1 0 0 0 0
## [3,] 0 0 0 1 0
## [4,] 0 0 0 0 1
## [5,] 0 0 0 0 0
and \(\boldsymbol{x}^{(j)}(t_1)\)
enzo$Data1$depvars[[1]][,,2]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 0 0 0
## [2,] 1 0 0 0 0
## [3,] 0 0 0 0 1
## [4,] 0 0 1 0 1
## [5,] 0 0 0 1 0
For
require(sna)
coord <- matrix(c(1,1,
1,2,
2,2,
2,1,
.5,.5),5,2,byrow=TRUE)
# === plot
par( mfrow = c(6,2) ,oma = c(0,1,0,0) + 0.1,
mar = c(1,0,1,1) + 0.1)
# grp1
gplot( enzo$Data1$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data1$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data1$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data1$depvars[[2]][,1,2], label=c(1:5))
# grp2
gplot( enzo$Data2$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data2$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data2$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data2$depvars[[2]][,1,2], label=c(1:5))
# grp3
gplot( enzo$Data3$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data3$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data3$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data3$depvars[[2]][,1,2], label=c(1:5))
# grp4
gplot( enzo$Data4$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data4$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data4$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data4$depvars[[2]][,1,2], label=c(1:5))
# grp5
gplot( enzo$Data5$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data5$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data5$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data5$depvars[[2]][,1,2], label=c(1:5))
# grp6
gplot( enzo$Data6$depvars[[1]][,,1] ,coord = coord, vertex.col = enzo$Data6$depvars[[2]][,1,1], label=c(1:5))
gplot( enzo$Data6$depvars[[1]][,,2] , coord = coord, vertex.col = enzo$Data6$depvars[[2]][,1,2], label=c(1:5))
For eaxh group \(j=1,\ldots,6\) we define a stochastic actor-oriented model \[ \boldsymbol{x}^{(j)}(t_1) \mid \boldsymbol{x}^{(j)}(t_0) \thicksim SAOM(\theta_j) \] The \(SAOM(\theta)\) is determined by its objective and rate functions \[ f_i(\boldsymbol{x},\theta)\text{, and } \lambda_i(\theta) \] that are assumed to be the same for all groups \(j=1,\ldots,N\).
You define the effects of the objective function of the
model in the standard way. The effects structure is obtained from
getEffects()
seed <- 1234
GroupEffects <- getEffects(enzo)
GroupsModel <- sienaAlgorithmCreate(projname = 'Enzo',seed=seed)
## If you use this algorithm object, siena07 will create/use an output file Enzo.txt .
All groups have the same effects and model, \(SAOM(\theta_j)\), but some of the parameters, \(\gamma_j\), vary across groups, and some, \(\eta\), are the same for all groups.
\[ \theta_j=\begin{bmatrix}\gamma_j\\\eta\end{bmatrix} \]
By default
summary(GroupEffects)$random
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
summary(GroupEffects)$effectName[summary(GroupEffects)$random]
## [1] "outdegree (density)"
summary(GroupEffects)$effectName[summary(GroupEffects)$random==FALSE]
## [1] "constant net rate (period 1)" "constant net rate (period 3)"
## [3] "constant net rate (period 5)" "constant net rate (period 7)"
## [5] "constant net rate (period 9)" "constant net rate (period 11)"
## [7] "reciprocity" "rate beh (period 1)"
## [9] "rate beh (period 3)" "rate beh (period 5)"
## [11] "rate beh (period 7)" "rate beh (period 9)"
## [13] "rate beh (period 11)" "beh linear shape"
The model assumes that the group-varying parameters follow a multivariate normal distribution \[ \gamma_j \thicksim N(\mu,\Sigma) \]
We want to estimate the population mean \(\mu\) and the non-varying parameter \(\eta\).
To estimate the model with default settings
groupModel.e <- sienaBayes(GroupsModel, data = enzo,
initgainGlobal=0.1, initgainGroupwise = 0.001,
effects = GroupEffects,
nrunMHBatches=40, silentstart=FALSE)
What did we get?
Check objects returned
names(groupModel.e)
The nmain: 300 posterior draws of \(\eta\) are in
ThinPosteriorEta
dim(groupModel.e$ThinPosteriorEta)
## [1] 300 2
This corresponds to reciprocity and the
linear shape for behaviour
par(mfrow=c(2,3))
for (k in c(1:2)){
post <- groupModel.e$ThinPosteriorEta[,k]
plot(ts(post))# draws in each iteration
acf(post,main='')# the autocorrelation function - how much dependence
hist(post,main='')# (empirical) posterior distribution
}
Note that these are draws from a bivariate distribution
plot(groupModel.e$ThinPosteriorEta)
We can calculate probabilites for the parameters using the values simulated from the posterior
For example, we can calculate the probability that the reciprocity parameter is positive given data as
mean(groupModel.e$ThinPosteriorEta[,1]>0)
## [1] 1
Given these 6 (synthetic) networks, the reciprocity parameter is positive with a posterior probability of 1
The posteriors for \(\mu\) are found
in ThinPosteriorMu
dim(groupModel.e$ThinPosteriorMu)
## [1] 300 3
Corresponding to the rate for the network, density, and rate for behaviour
par(mfrow=c(3,3))
for (k in c(1:3)){
post <- groupModel.e$ThinPosteriorMu[,k]
plot(ts(post))
acf(post,main='')
hist(post,main='')
}
To get a table to numerical summaries of the posteriors
summary(groupModel.e)
## Bayesian estimation.
## Prior distribution:
##
## Mu basic rate parameter net 2.5161
## outdegree (density) 0.0000
## rate beh period 1 0.7323
##
## Sigma 0.5812 0.0000 -0.1585
## 0.0000 1.0000 0.0000
## -0.1585 0.0000 0.1310
##
## Prior Df 5
##
## Kappa 0.0000
##
## Eta
## For the fixed parameters, constant prior.
##
## Algorithm specifications were nprewarm = 50 , nwarm = 50 , nmain = 250 , nrunMHBatches = 40 , nImproveMH = 100 ,
## nSampVarying = 1 , nSampConst = 1 , mult = 5 .
## Posterior means and standard deviations are averages over the last 250 runs.
##
## Proportion of acceptances in MCMC proposals after warming up:
## 0.19 0.20 0.23 0.24 0.26 0.31 0.25 0.25
##
## This should ideally be between 0.15 and 0.50.
## Note: this summary does not contain a convergence check.
## Note: the print function for sienaBayesFit objects can also use a parameter nfirst,
## indicating the first run from which convergence is assumed.
##
## Groups:
## Data1 Data2 Data3 Data4 Data5 Data6
##
## Posterior means and standard deviations for global mean parameters
##
## Total number of runs in the results is 300 .
## Posterior means and standard deviations are averages over 250 MCMC runs (excluding warming, after thinning).
##
## Post. Post. cred. cred. p varying Post. cred. cred.
## mean s.d.m. from to s.d. from to
## Network Dynamics
## 1. rate constant net rate (period 1) 2.1459 ( 0.7326 ) 0.8441 3.4185
## 2. rate constant net rate (period 3) 2.1719 ( 0.7819 ) 0.9366 3.9436
## 3. rate constant net rate (period 5) 2.3410 ( 0.7160 ) 1.2694 3.7795
## 4. rate constant net rate (period 7) 2.3381 ( 0.7119 ) 1.1655 3.7801
## 5. rate constant net rate (period 9) 2.5540 ( 0.7121 ) 1.2057 3.9846
## 6. rate constant net rate (period 11) 2.2758 ( 0.6683 ) 1.1175 3.7375
## 7. eval outdegree (density) -0.5602 ( 0.4986 ) -1.6841 0.4112 0.09 + 0.9793 0.5702 1.7998
## 8. eval reciprocity 1.7251 ( 0.5568 ) 0.6394 2.7458 1.00 -
##
## Behavior Dynamics
## 9. rate rate beh (period 1) 1.2257 ( 0.5262 ) 0.4060 2.4141
## 10. rate rate beh (period 3) 1.1665 ( 0.5480 ) 0.2743 2.6184
## 11. rate rate beh (period 5) 1.1340 ( 0.5262 ) 0.3071 2.4207
## 12. rate rate beh (period 7) 1.1159 ( 0.5089 ) 0.3590 2.4234
## 13. rate rate beh (period 9) 1.0906 ( 0.5191 ) 0.2869 2.3592
## 14. rate rate beh (period 11) 1.2065 ( 0.5260 ) 0.3976 2.5006
## 15. eval beh linear shape -0.1760 ( 0.5260 ) -1.2722 0.8481 0.36 -
##
## Posterior mean of global covariance matrix (varying parameters)
## 0.6371 -0.0077 -0.1424
## -0.0077 0.9590 0.0054
## -0.1424 0.0054 0.1591
##
## Posterior standard deviations of elements of global covariance matrix
## 0.3928 0.3808 0.1483
## 0.3808 0.8117 0.1823
## 0.1483 0.1823 0.0921
##
## For the rate parameters across all groups:
## Post.mean Post.sd
## basic rate parameter net 2.30241 0.55871
## rate beh period 1 1.16753 0.44662
In the previous model, reciprocity was assumed to have
the same parameter \(\theta_{j,rec}=\eta_{rec}\), for all \(j=1,\ldots,6\). To allow the
reciprocity parameter to vary across groups \(j\), set
GroupEffects <- setEffect( GroupEffects, recip,random=TRUE)
## effectName shortName include fix test initialValue parm randomEffects
## 1 reciprocity recip TRUE FALSE FALSE 0 0 TRUE
so that now \[ \theta_j=\begin{bmatrix} \gamma_{j,rate_x}\\ \gamma_{j,dens}\\ \gamma_{j,rec}\\ \gamma_{j,rate_z}\\ \eta_{shape,z} \end{bmatrix} \text{, and } \gamma_{j} \thicksim \mathcal{N}_{4}(\mu,\Sigma) \]
To estimate the model with one more random effect
groupModel.e2 <- sienaBayes(GroupsModel, data = enzo,
initgainGlobal=0.1, initgainGroupwise = 0.001,
effects = GroupEffects,
nrunMHBatches=40, silentstart=FALSE)
Now we only have one common parameter, one \(\eta\)
plot(groupModel.e2$ThinPosteriorEta)
In the plot we see the 300 simulated values of \(\eta\) that are drawn using MCMC from the posterior distribution of \(\eta\) given data.
Given these 6 (synthetic) networks, the reciprocity parameter is positive with a posterior probability of 1
But we have \(\mu\) has four dimentions (parameters)
pairs(groupModel.e2$ThinPosteriorMu)
We can calculate the probability that the reciprocity parameter is positive given data as
mean(groupModel.e2$ThinPosteriorMu[,3]>0)
## [1] 0.9966667
For reciprocity, we saw in the previous example that \[ \Pr(\mu_{rec}>0 \mid Data) \approx 1 \] but what about the group-level parameters \(\gamma_{1,rec},\gamma_{2,rec},\ldots,\gamma_{6,rec}\) - are they always positive for all of the groups?
The posterior (predictive) distributions of the \(\gamma_{j}\)’s are stored in
ThinParameters as iteration by
group by parameter
dim(groupModel.e2$ThinParameters)
## [1] 300 6 5
and within each group
head(groupModel.e2$ThinParameters[,1,])
## constant net rate outdegree (density) reciprocity rate beh
## [1,] 3.084515 -1.8143328 2.38747125 1.188040
## [2,] 2.534277 -0.8095603 2.10497307 1.392641
## [3,] 3.319430 -0.5882144 0.66770252 1.857220
## [4,] 2.482389 -1.6900750 1.84137145 2.622582
## [5,] 3.122526 -1.6681452 1.25374655 1.728405
## [6,] 3.727048 -0.9129924 0.06870421 1.269398
## beh linear shape
## [1,] 0.2569260
## [2,] -0.2283378
## [3,] -0.3325189
## [4,] -0.5843580
## [5,] -0.1605018
## [6,] 0.3287635
To illustrate how we have inference for \(\mu\) for density and reciprocity, but also for \(\gamma_j\) for density and reciprocity for each group, let us plot density against reciprocity for both levels
plot(groupModel.e2$ThinPosteriorMu[,2],
groupModel.e2$ThinPosteriorMu[,3],
xlim=range(groupModel.e2$ThinParameters[,,2]),
ylim=range(groupModel.e2$ThinParameters[,,3]),
pch=4, xlab='density',ylab='reciprocity')
grp.cols <- c('darkred','red','darkorange','orange','gold','darkgreen')
for (j in c(1:6))
{
lines( groupModel.e2$ThinParameters[,j,2], groupModel.e2$ThinParameters[,j,3],type='p',pch=1,col=grp.cols[j])
}
abline(v=mean(groupModel.e2$ThinPosteriorMu[,2]))
abline(h=mean(groupModel.e2$ThinPosteriorMu[,3]))
abline(v=0,col='grey',lty=4)
abline(h=0,col='grey',lty=4)
For an individual parameter it is common to look at the catepilar plot
require(HDInterval)
## Loading required package: HDInterval
grp.means <- colMeans(groupModel.e2$ThinParameters[,,3])
boxplot(groupModel.e2$ThinParameters[,order(grp.means),3])
CI <- hdi(groupModel.e2$ThinPosteriorMu[,3],1-.05)
abline(h=CI[1],col='grey',lty=4)
abline(h=CI[2],col='grey',lty=4)
To get a posterior distribution for \(\mu\) and \(\eta\) (and \(\Sigma\)), we need to have prior distributions for these parameters
A prior distribution for a parameter quantifies the uncertainty that we have about a parameter before we collect and observe DATA
For a cross-section \(n=5\) network, a Bernoulli graph says that each of the \(5(5-1)/2=10\) possible ties all have a probability of \(p\) of being present, independently of each other. The parameter \(0 \leq p \leq 1\) and hence a prior for \(p\) might be \(p \thicksim Beta(\alpha,\beta)\)
alpha.p <- 2
beta.p <- 2
p <- seq(from=.0001,to =.9999, length.out = 1000)
plot( p , dbeta(p, alpha.p, beta.p) , type = 'l', yaxt='n',ylab=expression(pi(p)))
Assume that we observe the network
\[ \mathbf{X}= \begin{bmatrix} - & 1 & 0 & 0 & 0 \\ - & - & 0 & 1 & 0 \\ - & - & - & 0 & 0 \\ - & - & - & - & 0 \\ - & - & - & - & - \\ \end{bmatrix} \]
This has \(L=\sum_{i<j}x_{ij}=2\), which gives the likelihood function for \(p\)
plot(p,p^2*(1-p)^8, type = 'l', yaxt='n',ylab='L(p ; X ) ' )
The posterior distribution will be \(p \mid \mathbf{X} \thicksim Beta(2+\alpha,8+\beta)\)
plot( p , dbeta(p, alpha.p+2, beta.p+8) , type = 'l', yaxt='n',ylab=expression(pi(p ~"|"~ X)))
Try with different values of the hyperparameters \(\alpha\) and \(\beta\) in the prior for \(p\)
Even better, try Mattias Villani’s widget that illustates Bayesian inference https://observablehq.com/@mattiasvillani/bayesian-inference-for-bernoulli-iid-data?collection=@mattiasvillani/bayesian-learning[Beta-Binomial]
Load routies
Create two waves for \(M\) networks, all of size \(n\)
n <- 10 # you could make this larger or smaller
M <- 5 # this is very few groups and you could try to produce many
The Karen dataset has as many groups as you like
Karen <- data.set.karen.set.up(n=10,# number of nodes
M=5,# number of networks
seed=123,nbrNodes=1,nmain=20,nwarm=20)
The networks are simulated from a model with effects, that in addition to the defaults have
transTripsimX with interaction1 = "beh"Now, figure out what inputs you need. The sienaGroup
object is Karen$my.Karen
Karen$my.Karen
## Dependent variables:
## net : oneMode
## beh : behavior
## Total number of groups: 5
## Total number of periods: 5
For different model specifications, think of what priors you need to specify. Which ones should be random and which fixed?
With the same prior, how does increasing \(M\) affect inference?
Increasing the prior variance (\(\Sigma\)), how does that affect inference?
groupModel.e <- sienaBayes(Karen$GroupsModel, data = Karen$my.Karen,
initgainGlobal=0.1, initgainGroupwise = 0.001,
effects = Karen$effects, priorMu = Karen$priorMu, priorSigma = Karen$priorSigma,
priorKappa = 0.01,
nwarm=Karen$nwarm, nmain=Karen$nmain, nrunMHBatches=40,
nbrNodes=Karen$nbrNodes, silentstart=FALSE)