# Lavaan bootstrap confidence intervals

There are several freely available packages for structural equation modeling SEMboth in and outside of R. In the R world, the three most popular are lavaanOpenMXand sem. I have tended to prefer lavaan because of its user-friendly syntax, which mimics key aspects of of Mplus. Although OpenMX provides a broader set of functions, the learning curve is steeper. SEM is largely a multivariate extension of regression in which we can examine many predictors and outcomes at once. SEM also provides the innovation of examining latent structure i. This is a nice dataset for regression because there are many interdependent variables: crime, pollutants, age of properties, etc.

And the syntax even has many similarities with lm. The regression coefficient is identical good! This highlights an important difference that basic SEM often focuses on the covariance structure of the data. For example, do males and females differ on mean level of a depression latent factor? Note that we can get standardized estimates in lavaan as well.

This is a more complicated topic in SEM because we can standardize with respect to the latent variables alone std. The latter is usually what is reported as standardized estimates in SEM papers. What if we believe that the level nitric oxides nox also predicts home prices alongside crime? We can add this as a predictor as in standard multiple regression. Furthermore, we hypothesize that the proximity of a home to large highways rad predicts the concentration of nitric oxides, which predicts lower home prices?

The model looks like this using the handy semPaths function from semPlot :. Parameter estimation can be hampered when the variances of variables in the model differ substantially orders of magnitude. We can rescale variables in this case by multiplying by a constant. This has no effect on the fit or interpretation of the model — we just have to recall what the new units represent.

Also, you can always divide out the constant from the parameter estimate to recover the original units, if important. You can request more detailed global fit indices from lavaan in the model summary output using fit. You can also get just the fit measures including additional statistics using fitmeasures :. This suggests the need to examine the fit in more detail. First, we can look at the mismatch between the model-implied and observed covariance matrices.

Conceptually, the goal of structural equation modeling SEM is to test whether a theoretically motivated model of the covariance among variables provides a good approximation of the data. Formally, we are seeking to develop a model whose model-implied covariance matrix approaches the sample observed covariance matrix. We might be able to interpret this more easily in correlational standardized units. The inspect function in lavaan gives access to a number of model details, including this:.

In particular, getting the misfit of the bivariate associations is very helpful. Here, we ask for residuals in correlational units, which can be more intuitive than dealing with covariances that are unstandardized.

Note that this is the subtraction of the observed - model-implied matrices above.Bootstrap is a method of inference about a population using sample data. Bradley Efron first introduced it in this paper in Bootstrap relies on sampling with replacement from sample data. This technique can be used to estimate the standard error of any statistic and to obtain a confidence interval CI for it.

Bootstrap is especially useful when CI doesn't have a closed form, or it has a very complicated one. Bootstrap framework is straightforward. We just repeat R times the following scheme: For i -th repetition, sample with replacement n elements from the available sample some of them will be picked more than once. We call them bootstrap realizations of T or a bootstrap distribution of T.

Based on it, we can calculate CI for T. There are several ways of doing this. Taking percentiles seems to be the easiest one. Suppose we want to find CIs for median Sepal. Lengthmedian Sepal. Width and Spearman's rank correlation coefficient between these two. We'll use R 's boot package and a function called To use its power we have to create a function that calculates our statistic s out of resampled data.

It should have at least two arguments: a dataset and a vector containing indices of elements from a dataset that were picked to create a bootstrap sample. If we wish to calculate CIs for more than one statistic at once, our function has to return them as a single vector. Now, we can use the boot function. We have to supply it with a name od dataset, function that we've just created, number of repetitions R and any additional arguments of our function like cor. Below, I use set.

It has two interesting elements. Before we start with CI, it's always worth to take a look at the distribution of bootstrap realizations. We can use plot function, with index telling at which of statistics computed in foo we wish to look. Distribution of bootstrap correlation coefficients seems quite normal-like.

Let's find CI for it. We can use boot. It defaults to percent CIs, but it can be changed with the conf parameter. One of them, studentized interval, is unique.

It needs an estimate of bootstrap variance. We didn't provide it, so R prints a warning: bootstrap variances needed for studentized intervals. Variance estimates can be obtained with second-level bootstrap or easier with jackknife technique. This is somehow beyond the scope of this tutorial, so let's focus on the remaining four types of bootstrap CIs.

If we don't want to see them all, we can pick relevant ones in type argument. Possible values are normbasicstudpercbca or a vector of these. The boot.I wrote this brief introductory post for my friend Simon. In the specific case of mediation analysis the transition to R can be very smooth because, thanks to lavaanthe R knowledge required to use the package is minimal.

Analysis of mediator effects in lavaan requires only the specification of the model, all the other processes are automated by the package.

So, after reading in the data, running the test is trivial. This time, to keep the focus on the mediation analysis I will skip reading-in the data and generate a synthetic dataset instead. This is because otherwise I would have to spend the next paragraph explaining the dataset and the variables it contains and I really want to only focus on the analysis. As shown in the lavaan website performing a mediation analysis is as simple as typing in the code below:.

For multiple mediators one simply need to extend the model recycling the code of the first mediator variable:.

Epstein news update

Note that with multiple mediators we must add the covariance of the two mediators to the model. Covariances are added using the notation below:. There are two ways to test the null hypothesis that the indirect effect are equal to each other.

The first is to specify a contrast for the two indirect effects. In the definition of the contrast the two indirect effects are subtracted. If it is significant the two indirect effects differ. The second option to determine whether the indirect effects differ is to set a constrain in the model specifying the two indirect effect to be equal. Then, with the anova function one can compare the models and determine which one is better. Including the constrain and comparing the models is simple:. In my case the test is not significant so there is no evidence that the indirect effects are different. For these toy models there is no further need of customizing the calls to sem. However, when performing a proper analysis one might prefer to have bootstrapped confidence intervals.

Bootstrap confidence interval can be extracted with the function calls 1 summary, 2 parameterEstimates, or 3 bootstrapLavaan.

### Lavaan Bootstrap Confidence Intervals

NOTE that bootstrapLavaan will re-compute the bootstrap samples requiring to wait as long as it took the sem function to run if called with the bootstrap option. Since this post is longer than I wanted it to be, I will leave as a brief introduction to mediation with lavaan. In this follow up post I describe multiple mediation with lavaan using an actual dataset.

On github is the whole code in one. R file. Here is the first post. For customization of the plots which can be created using semPlot see this post too. Like Like. Ideally I would like the independent variable to be at the top or on the leftthe mediators to be in the middle, and the dependent variable to be at the bottom or on the right.

It does not seem to be possible to do this in semPlot, but I noticed in your response to Lola M on July 6th that you had suggested doing this. I would be extremely grateful if you might be able to share some code to show how to do this.

Many thanks. Hi RoseB, I am answering from my phone so apologies for brevity and potential typos.Consider a bivariate relationship between X and Y.

If X is a predictor and Y is the outcome, we can fit a regression model.

Bootstrap Confidence Intervals Walk Through

The model can be expressed as a path diagram as shown below. In a path diagram, there are three types of shapes: a rectangle, a circle, and a triangle, as well as two types of arrows: one-headed single-headed arrow and two-headed double-headed arrows. A rectangle represents an observed variable, which is a variable in the dataset with known information from the subjects.

A circle or elliptical represents an unobserved variable, which can be the residuals, errors, or factors. A triangle, typically with 1 in it, represents either an intercept or a mean.

A one-headed arrow means that the variable on the side without the arrow predicts the variable on the side with the arrow. If the two-headed arrow is on a single variable, it represents a variance. If the two-headed arrow is between two variables, it represents the covariance between the two variables. Many different software can be used to draw path diagrams such as Powerpoint, Word, OmniGaffle, ect.

In the classic paper on mediation analysis, Baron and Kennyp. Suppose the effect of X on Y may be mediated by a mediating variable M. Then, we can write a mediation model as two regression equations. This is the simplest but most popular mediation model. This simple mediation model can also be portrayed as a path diagram shown below. Note that a mediation model is a directional model. For example, the mediator is presumed to cause the outcome and not vice versa.

If the presumed model is not correct, the results from the mediation analysis are of little value. Mediation is not defined statistically; rather statistics can be used to evaluate a presumed mediation model.This function generates 5 different types of equi-tailed two-sided nonparametric confidence intervals. These are the first order normal approximation, the basic bootstrap interval, the studentized bootstrap interval, the bootstrap percentile interval, and the adjusted bootstrap percentile BCa interval.

All or a subset of these intervals can be generated. A vector of character strings representing the type of intervals required. The value should be any subset of the values c "norm","basic", "stud", "perc", "bca" or simply "all" which will compute all five types of intervals.

This should be a vector of length 1 or 2.

## Mediation Analysis

The first element of index indicates the position of the variable of interest in boot. The second element indicates the position of the variance of the variable of interest. If both var. The default is that the variable of interest is in position 1 and its variance is in position 2 as long as there are 2 positions in boot. If supplied, a value to be used as an estimate of the variance of the statistic for the normal approximation and studentized intervals. If it is not supplied and length index is 2 then var.

For studentized intervals var. For the normal approximation, if var. If a transformation is supplied through the argument h then var. This is a vector of length boot. It is used only for studentized intervals. The observed value of the statistic of interest. The default value is boot. Specification of t0 and t allows the user to get intervals for a transformed statistic which may not be in the bootstrap output object. See the second example below. An alternative way of achieving this would be to supply the functions hhdotand hinv below.

The bootstrap replicates of the statistic of interest. It must be a vector of length boot. It is an error to supply one of t0 or t but not the other.We are intentionally creating a moderated mediation effect here and we do so below by setting the relationships the paths between our causal chain variables and setting the relationships for our interaction terms.

Because we have interaction terms in our regression analyses, we need to mean center our IV and Moderator Z. We will first create two regression models, one looking at the effect of our IVs time spent in grad school, time spent with Alex, and their interaction on our mediator number of publicationsand one looking at the effect of our IVs and mediator on our DV number of job offers. Next, we will examine the influence of our moderating variable time spent with Alex on the mediation effect of time spent in grad school on number of job offers, through number of publications.

To do this, we will examine the mediation effect for those who spend a lot of time with Alex versus those who spend little time with Alex. One model specifies the effect of our IV time spent in grad school on our Mediator number of publications [and in our case, our moderator time spent with Alex and the interaction]. The other model specifies the effect of the IV time spent in grad school and Mediator number of publications and possibly moderator as well on our DV number of job offers.

In this mediation package we list the moderator as a covariate and set the levels to what we want. For a review on bootstrapping techniques, see Efron, The following code tests whether the difference between indirect effects at each level of the moderator is significantly different from zero.

Now we take the specified models and all of the effects we want to estimate and run them through the SEM function. Hayes, A. Introduction to mediation, moderation, and conditional process analysis: A regression-based approach.

New York: The Guilford Press. Michalak, N. Rosseel, Y. Sales, A. Review: Mediation package in R. Journal of Educational and Behavioral Statistics, 421, Tingley, D.

Kubota kx121 3 pins and bushings

Mediation: R package for causal mediation analysis. R Markdown Cheatsheet. Code Download Rmd. ModMed Examine the structure of the dataset 'data. ModMed [,c ,8,9,13 ], 2 Put descriptive stats summary into table with only the columns of information that we care about n mean sd median min max se jobs 4.

Altova 85 es listas iptv

This allows us to use the correct mediate function from the "mediation" package Mod. ModMed This model predicts number of publications from time spent in grad school, time spent with alex, and the interaction between the two summary Mod. ModMed This model predicts number of job offers from time spent in grad school, time spent with alex, number of publications, and the interaction between time spent in grad school and time spent with alex summary Mod. Mediated 0.

TestAlex, covariates. TestAlex ADE covariates. We can also compute means and standard deviations for use in simple slopes analyses After specifying all the necessary components, we fit the model using an SEM function install. SDbelow total. SDabove Proportion mediated conditional on moderator To match the output of "mediate" package prop. SDbelow prop. SDabove Index of moderated mediation An alternative way of testing if conditional indirect effects are significantly different from each other index.Randomly shuffling the treatments between the observations is like randomly sampling the treatments without replacement.

In other words, we randomly sample one observation at a time from the treatments until we have n observations. This provides us with a technique for testing hypotheses because it provides a new ordering of the observations that is valid if the null hypothesis is assumed true.

As before, there are two options we will consider - a parametric and a nonparametric approach. The nonparametric approach will be using what is called bootstrapping and draws its name from "pull yourself up by your bootstraps" where you improve your situation based on your own efforts.

In statistics, we make our situation or inferences better by re-using the observations we have by assuming that the sample represents the population.

Since each observation represents other similar observations in the population, if we sample with replacement from our data set it mimics the process of taking repeated random samples from our population of interest.

This process ends up giving us good distributions of statistics even when our standard normality assumption is violated, similar to what we encountered in the permutation tests. Bootstrapping is especially useful in situations where we are interested in statistics other than the mean say we want a confidence interval for a median or a standard deviation or when we consider functions of more than one parameter and don't want to derive the distribution of the statistic say the difference in two medians.

Our uses for bootstrapping will be typically to use it when some of our assumptions especially normality might be violated for our regular procedure to provide more trustworthy inferences. To perform bootstrapping, we will use the resample function from the mosaic package.

We can apply this function to a data set and get a new version of the data set by sampling new observations with replacement from the original one.

Sevis fee

The new version of the data set contains a new variable called orig. By summarizing how often each of these id's occurred in a bootstrapped data set, we can see how the re-sampling works. The code is complicated for unimportant reasons, but the end result is the table function providing counts of the number of times each original observation occurred, with the first row containing the observation number and the second row the count.

In the first bootstrap sample shown, the 1 st2 ndand 4 th observations were sampled one time each and the 3 rd observation was not sampled at all. The 5 th observation was sampled two times. Observation 42 was sampled four times. This helps you understand what types of samples that sampling with replacement can generate.

A second bootstrap sample is also provided. It did not re-sample observations 1, 2, or 4 but does sample observation 5 three times.

You can see other variations in the resulting re-sampling of subjects. Each run of the resample function provides a new version of the data set.

I campionati