Meta-Psychology, 2022, vol 6, MP.2020.2595, 
https://doi.org/10.15626/MP.2020.2595 
Article type: Tutorial 
Published under the CC-BY4.0 license 

 
Open data: Not Applicable 
Open materials: Yes 
Open and reproducible analysis: Yes 
Open reviews and editorial process: Yes 
Preregistration: No 

 
Edited by:  Rickard Carlsson 
Reviewed by: Sacha Epskamp, Ulrich Schimmack 
Analysis reproduced by: Marco Lauriola 
All supplementary files can be accessed at the OSF project 
page: https://doi.org/10.17605/OSF.IO/H9DV5 

 
A Tutorial in Longitudinal Measurement Invariance and 
Cross-lagged Panel Models Using Lavaan 

 
Sean P. Mackinnon 
Dalhousie University 

Robin Curtis 
Dalhousie University 

Roisin M. O’Connor 
Concordia University

In longitudinal studies involving multiple latent variables, researchers often seek to pre-
dict how iterations of latent variables measured at early time points predict iterations 
measured at later time points. Cross-lagged panel modeling, a form of structural equa-
tion modeling, is a useful way to conceptualize and test these relationships. However, 
prior to making causal claims, researchers must first ensure that the measured con-
structs are equivalent between time points. To do this, they test for measurement invar-
iance, constructing and comparing a series of increasingly strict and parsimonious mod-
els, each making more constraints across time than the last. This comparison process, 
though challenging, is an important prerequisite to interpretation of results. Fortunately, 
testing for measurement invariance in cross-lagged panel models has become easier, 
thanks to the wide availability of R and its packages. This paper serves as a tutorial in 
testing for measurement invariance and cross-lagged panel models using the lavaan 
package. Using real data from an openly available study on perfectionism and drinking 
problems, we provide a step-by-step guide of how to test for longitudinal measurement 
invariance, conduct cross-lagged panel models, and interpret the results. Original data 
source with materials: https://osf.io/gduy4/. Project website with data/syntax for the 
tutorial: https://osf.io/hwkem/.  

Keywords: cross-lagged panel; lavaan; measurement invariance; R; tutorial; perfection-
ism; social anxiety 

 
The proliferation of R as a free and versatile pro-
gramming language and analytic tool, coupled with 
the increasing power of modern computers, has 
made possible a great range of new statistical tests 
for students and professionals across varied disci-
plines. However, the learning curve for R is steep, 
and many statistical topics are so specialized that 
they lack coherent step-by-step guides with accom-
panying syntax. This paper and its accompanying 
OSF page (https://osf.io/hwkem/) demonstrate 
the process of selecting, running, and evaluating 
cross-lagged panel models with the lavaan package 
in R, using real data from an open-access source. 
While our target audience for this paper is graduate 

students with a basic understanding of R, confirma-
tory factor analysis, and structural equation model-
ling, we seek to present the material in such a way 
that parts of it may be useful to researchers at a 
range of levels. Readers unfamiliar with structural 
equation modelling might start with Ullman (2006), 
which is a relatively accessible introduction.  

Measurement Invariance 

Much of the time in psychology, we do not meas-
ure the construct of interest directly, but rather in-
fer it through a series of items associated with it. 


MACKINNON ET AL. 
 

2 

Such constructs are called latent variables, such as 
drinking motives (Mackinnon et al., 2017) and per-
fectionism (Rice, Loscalzo, Giannini, & Rice, 2020).1 
Confirmatory factor analysis (CFA) is a statistical 
technique that allows us to test whether clusters of 
items in our measure are indeed reflective of the la-
tent construct to which we have assigned them. 

When studying constructs over time, we admin-
ister the same measurement instruments repeat-
edly. To make logical claims about how latent varia-
bles change across time, we must first establish that 
our instruments are measuring the construct con-
sistently over time.  Measurement invariance (MI) is 
upheld in a study when “participants across all [time 
periods] interpret the individual questions, as well 
as the underlying latent factor, in the same way" 
(Van de Schoot et al, pp. 1 - 2). If MI is not upheld, 
then the nature of the latent construct changes from 
over time, making comparisons across measure-
ment occasions difficult. Since the proposal of MI as 
a concept thirty years ago (Byrne, Shavelson, and 
Muthen, 1989), researchers have considered MI an 
important quality to check for in longitudinal stud-
ies incorporating latent variables. In simpler terms, 
a fundamental problem in longitudinal measure-
ment is that the mere passage of time (or the act of 
observing one’s own thoughts through repeated 
measurement) can sometimes change how people 
interpret questionnaire items. To make compari-
sons over time, we want MI to ensure that the na-
ture of the construct has not changed substantially 
over time. 

To use CFA to test for MI, researchers first set up 
a set of nested model comparisons; essentially, a se-
ries of CFA models with increasingly strict con-
straints on equality over time.  More strict and par-
simonious models allow fewer parameters to vary 
over time for the same latent construct. Broadly 
speaking, the parameters we are concerned with 
are: 1) factor loadings, which show how representa-
tive each item is of its latent factor; 2) intercepts, 
which relate to the mean levels of each item; and 3) 
residual variances, which represent the other unex-
plained influences predicting item responses be-
sides latent variables. The most parsimonious model 
that maintains adequate CFA fit indices determines 

 
1 These two papers also serve as good templates for re-
porting measurement invariance results for a beginning 
learner.   
2 Terminology for model strictness can vary. In this pa-
per, we follow Van de Schoot et al. (2012) in using the 
terms “metric,” “scalar,” and “residual.” On the other 

the level of invariance. We can further subdivide MI 
according to four levels (Widaman & Reise, 1997; 
Widaman et al., 2010). The least stringent level of in-
variance, referred to as configural invariance, allows 
factor loadings, item intercepts, and residual vari-
ances to vary across waves. This establishes that the 
same factor structure applies across waves (i.e., the 
same number of latent variables with the same items 
loading on each factor). The next level is metric – 
sometimes called weak2 – invariance and constrains 
factor loadings to equality across waves. This estab-
lishes that items do not become more (or less) rep-
resentative of the latent construct at different 
measurement occasions. That is, as factor loadings 
get larger, they are stronger indicators of the latent 
variable; metric invariance proposes that items do 
not vary in how representative they are of the con-
struct over time. The following level is scalar invari-
ance, and constrains not only factor loadings, but 
also item intercepts to equality across waves. Con-
straining item intercepts to equality establishes that 
the mean levels of the underlying items themselves 
do not vary significantly between time periods. That 
is, if scalar invariance is violated it means that the 
interpretation of the absolute value of a score 
changes as time goes on. It is analogous to how $100 
today does not have the same value as it did 100 
years ago. Using a psychological example, a lack of 
scalar invariance might make it appear that means 
are increasing over time when it is really that par-
ticipants are changing how they interpret the re-
sponse scale over time. The final level is residual in-
variance, and constrains factor loadings, item inter-
cepts, and residual variances to equality across 
waves. Residual variance represents the degree to 
which the model deviated from the actual data due 
to external factors. Fixing residual item variation to 
equality across waves therefore establishes that any 
external factors (e.g., unmeasured variables that are 
changing over time and predicting variation in the 
measured latent construct) also display minimal 
change over time (Van de Schoot et al., 2012).  

Why does this matter? Without accounting for 
MI, researchers may misinterpret the causes behind 
observed effects. For instance, in an investigation of 
student narcissism levels between the 1990s to 

hand, Widaman et al. (2012) and Wu et al. (2007) opt for 
the terms “weak,” “strong,” and “strict” in referring to the 
same concepts. 


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
3 

2010s, Wetzel et al. (2017) examined MI to check the 
validity of prior findings suggesting that narcissism 
is increasing among today’s youth (Twenge & Camp-
bell, 2009). In a three-wave model incorporating re-
sponses from more than 50,000 students, Wetzel et 
al. found nonequivalence in several aspects of the 
measurement of narcissism on the Narcissistic Per-
sonality Inventory. Specifically, facets of leadership 
and vanity were not invariant, suggesting that stu-
dents’ interpretations of questions pertaining to 
these aspects were changing over time, a feature of 
the data not previously acknowledged in Twenge & 
Campbell’s (2009) study. When accounting for this 
partial nonequivalence, their model actually sug-
gested a decrease in narcissism over time. Checking 
for MI is thus an important practice prior to inter-
preting longitudinal results. 

Cross-lagged Panel Models 

Another important aspect of longitudinal studies 
are directional effects between variables over time. 
Researchers can use cross-lagged panel models 
(CLPM) to investigate how well different variables 
predict future iterations of each other, helping to 
make stronger causal claims by establishing tem-
poral precedence (Cole and Maxwell, 2003). Results 
can sometimes help clarify the direction of relation-
ships in a way cross-sectional correlations cannot. 
For example, Mackinnon (2012) found a small posi-
tive correlation between perceived social support 
and school grades cross-sectionally; however, a 
cross-lagged panel model suggested that higher 
grades led to more social support, rather than the 
reverse, contrary to common belief. This study used 
a three-wave design over several years, but diary 
studies using more waves over shorter time periods 
are also common (e.g., Sherry & Hall, 2009). 

Cross-lagged panel models are one attempt to 
make stronger causal claims with longitudinal data. 
However, it is important to note that cross-lagged 
panel models have been criticized quite early on 
(e.g., Rogosa, 1980) and more recently by Hamaker, 
Kuiper, & Grasman (2015). The crux of the criticism 

 
3 Readers should also note that the random intercepts 
cross-lagged panel model requires a minimum of three 
measurement occasions, unlike the traditional cross-
lagged panel model, which is identifiable with only two.  

is that the traditional cross-lagged panel model does 
not properly disentangle within-person processes 
(e.g., state-like, day-to-day change) from between-
person processes (e.g., trait-like, stability from day-
to-day). As a result, traditional cross-lagged panel 
models can produce incorrect results for statistical 
significance, which relationship is larger, and even 
the sign/direction of the relationship (Hamaker et 
al., 2015)! As a potential solution to these issues, 
Hamaker et al. (2015; see also Mulder & Hamaker, 
2020 for the extension to multiple indicators, as 
used in this paper) introduces the random inter-
cepts cross-lagged panel model, which properly ac-
counts for the stable, trait-like nature of many con-
structs. Thus, though the bulk of the paper will focus 
on the traditional cross-lagged panel model, we also 
present code and interpretation for a random inter-
cepts cross-lagged panel model.3  

MI in Cross-lagged Panel Models 

When testing for MI in CLPMs, model complexity 
and number of participants can also have an impact 
on interpretations. The greater the number of waves 
and items, the more complex the model will be, and 
the greater the number of participants needed to fa-
cilitate reliable testing. This is generally true of 
structural equation models in general (Kyriazos, 
2018). The configural model estimates the greatest 
number of parameters and is thus the least parsimo-
nious model. To simplify the problem of longitudinal 
studies slightly in each wave, actual scores of par-
ticipants will differ somewhat from the model’s pre-
diction, but the same participants are responding to 
the measure each time, creating non-independence 
of observations across waves. We model this non-
independence as a covariance between the residuals 
of the same items among waves (see the annotated 
syntax file that accompanies this paper). It is a rea-
sonable a-priori assumption to expect the magni-
tude of covariances between residuals across waves 
to be similar. It is common for researchers to fix re-
sidual covariance to equality across waves.4 Fixing 
these values to equality is thus often theoretically 

4 Another common set of constraints used in longitudinal 
data is an first-order autoregressive or AR(1) correlated 
error structure. That is, a set of constraints that predict 
the covariances will get smaller as the time lags increase 
(i.e., constructs measured more closely in time should be 
more strongly related).  


MACKINNON ET AL. 
 

4 

justified and helps reduce the total number of pa-
rameters estimated by the configural model (Cole 
and Maxwell, 2003)5. Prior to setting up and testing 
the final structural equation model, it is useful to 
map out the predicted relationship between varia-
bles as figures.  

The dataset used in this study derives from a 21-
day diary study investigating perfectionism, motiva-
tions for drinking, and alcohol-related problems. It 
is open-access, and all data are free to download at 
https://osf.io/hwkem/. For the purpose of this pa-
per, our key latent variables of interest are: 1) per-
fectionistic self-presentation (PSP), which (as oper-
ationalized in these data) measures an individual’s 
desire to hide their imperfections; and 2) and state 
social anxiety (SSA), which measures transitory feel-
ings of anxiety associated social situations. PSP was 
first proposed an aspect of perfectionism by Hewitt 
et al. (2003), whereas SSA was proposed as measure 
of social anxiety by Kashdan and Steger (2006). For 
a recent study using these data discussing the rela-
tion between PSP and SSA in more detail, please see 
Kehayes and Mackinnon (2019). To make the exam-
ple easier to follow we focus on only 5 of the 20 days 
perfectionistic self-presentation and social anxiety 
were measured (arbitrarily, days 7-11). Thus, in the 
present example we first (a) establish longitudinal 
measurement invariance over 5 days and then (b) 
test a cross-lagged panel model. In general, think of 
the measurement invariance portion as a necessary 
first step to proceed with hypothesis testing in the 
cross-lagged panel model.  Though theory suggests 
that perfectionism would cause social anxiety rather 
than the reverse, we do not concern ourselves with 
formally testing confirmatory hypotheses in this pa-
per – even though the cross-lagged panel model 
would be the spot where hypotheses about direc-
tionality of relationships are formally tested in a tra-
ditional paper. Instead, we focus on the technical, 
analytical aspects with the goal of teaching readers 
how to conduct the analysis.  

 
5 Note that measurement invariance between independ-
ent groups using multi-group modelling (e.g., comparing 
men, women, and nonbinary groups) is comparatively 
simpler than measurement invariance in the longitudinal 
context because of this correlated error structure. Read-

Method 

Dataset 

Our study uses a simplified version of the dataset 
published by Mackinnon et al. (2021). We first 
trimmed and reformatted the dataset to contain 
only the variables of interest (see Appendix A). The 
abridged dataset contains responses given by 251 
participants for two latent variables (PSP, composed 
of three items; and SSA, composed of seven items) 
across five days (days seven through eleven of the 
study). PSP items were measured using a 7-point 
scale from 1 to 7. SSA items were measured using a 
5-point scale from 0 to 4.  

Beyond trimming the dataset to only the varia-
bles of interest, we also converted the data to wide 
format (in which every participant receives a unique 
row, and each variable receives a unique column) 
from long format (in which every data point receives 
a unique row, with participants and categorical var-
iables recurring across rows) For a simple illustra-
tion of this pertaining to SSA and PSP values across 
days, see Figure 1. This format conversion aided in 
setting up the code for our later models. 
 

Figure 1. Examples of Wide Vs. Long Format in PSP 
and SSA Values Across Two Days 

Data Analysis Strategy 

This section describes our strategy for compar-
ing and selecting models. Our goal was to compare 
nested versions of our CLPM, using CFA, to deter-
mine the most appropriate parameters for our final 
structural model. We sought to use the simplest 
model (i.e., model estimating the fewest parameters) 
that maintained a good fit for our data, while also 

ers interested in measurement invariance for independ-
ent groups can take advantage of the measurementInvar-
iance() function in lavaan for convenience functions that 
are much shorter than the code in this tutorial 
(https://lavaan.ugent.be/tutorial/groups.html), even 
though the core principles are the same. 


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
5 

making good theoretical sense. Because lavaan al-
lows undefined parameters to vary freely by default, 
simpler models are those with more constraints de-
fined; thus, somewhat counterintuitively, more par-
simonious models appear more complicated in the 
code. In constructing our models through lavaan, we 
relied upon four key operators: 1) =~ , which is used 
for factor loadings, and can be thought of as “is 
measured by;” 2) ~ , which is used for regression for-
mulas, and can be thought of as “is regressed on;” 3) 
~~ , which is used for defining variance and residual 
covariance, and can be thought of as “varies with;” 
and 4) ~ 1, which is a special notation for defining in-
tercepts. Within any given formula, labels may be as-
signed to terms using the asterisk (*). Any items to 
which the same label is applied are fixed to equality 
in the model’s calculations. To better understand 
the full definitions of each model below, we recom-
mend referring to Figures 2-5, which show simpli-
fied versions of each model with their respective 
constraints. It may also be useful to simultaneously 
follow along with our annotated code on our OSF 
page. 

Our approach involves five steps total: (a) config-
ural model; (b) metric model; (c) scalar model; (d) re-
sidual model; and (e) structural model. Notably, in 
the first four steps we use covariances (~~) rather 
than regressions (~) for relationships between vari-
ables. In the fifth step, we do a true cross-lagged 
panel model where temporal directionality of rela-
tionships is assumed (e.g., day 7 predicting day 8). 
This choice has no impact on the overall fit indices, 
but it is worth noting that the relationships between 
variables in the first four steps are more akin to bi-
variate correlations (albeit, corrected for measure-
ment unreliability), while the last step is the true test 
of hypotheses with paths allowing for stronger 
causal inferences than correlations by adjusting for 
past-day levels of each variable.  

 
Configural Model. We began by defining our configu-
ral model. Excluding the correlated error structure, the 
full configural model is as follows: 

configural.v1 <- 
' 
# PSP factor loadings defined 
 
PSP.7 =~ NA*psp1.7 + psp2.7 + psp3.7 
PSP.8 =~ NA*psp1.8 + psp2.8 + psp3.8 
PSP.9 =~ NA*psp1.9 + psp2.9 + psp3.9 
PSP.10 =~ NA*psp1.10 + psp2.10 + psp3.10 
PSP.11 =~ NA*psp1.11 + psp2.11 + psp3.11 
 
# PSP variance constrained to 1 
 
PSP.7 ~~ 1*PSP.7 
PSP.8 ~~ 1*PSP.8 
PSP.9 ~~ 1*PSP.9 
PSP.10 ~~ 1*PSP.10 
PSP.11 ~~ 1*PSP.11 
 
# SSA factor loadings defined 
 
SSA.7 =~ NA*ssa1.7 + ssa2.7 + ssa3.7 + ssa4.7 + ssa5.7 + 
ssa6.7 + ssa7.7 
SSA.8 =~ NA*ssa1.8 + ssa2.8 + ssa3.8 + ssa4.8 + ssa5.8 
+ ssa6.8 + ssa7.8 
SSA.9 =~ NA*ssa1.9 + ssa2.9 + ssa3.9 + ssa4.9 + ssa5.9 
+ ssa6.9 + ssa7.9 
SSA.10 =~ NA*ssa1.10 + ssa2.10 + ssa3.10 + ssa4.10 + 
ssa5.10 + ssa6.10 + ssa7.10 
SSA.11 =~ NA*ssa1.11 + ssa2.11 + ssa3.11 + ssa4.11 + 
ssa5.11 + ssa6.11 + ssa7.11 
 
# SSA variance constrained to 1 
 
SSA.7 ~~ 1*SSA.7 
SSA.8 ~~ 1*SSA.8 
SSA.9 ~~ 1*SSA.9  
SSA.10 ~~ 1*SSA.10 
SSA.11 ~~ 1*SSA.11 
 
' 
configural.model <- paste(configural.v1, errorstruc-
ture, sep = ' ', collapse = NULL)

 
MACKINNON ET AL. 

 
6 

Figure2 
Configural Model (2 days only).  

 
Note. Ovals represent the latent variables. Rectangles labeled represent the measured items, with their corresponding 

factor loadings shown as arrows to the latent variables. Triangles represent the item intercepts, and circles represent the 
residuals. Double-headed arrows represent covariances. In the configural model, we allowed factor loadings, item inter-
cepts, and residual variances to vary freely across waves. The model tested in the tutorial uses 5 days of data; to reduce 
visual clutter, this diagram shows only the first two days of data. 

 
For this and all subsequent models, we first de-

fined the factor loadings of our latent variables. By 
default in lavaan, the first factor loading for each la-
tent variable would normally be constrained to 1, 
whereas the variance of each latent variable would 
be unconstrained. However, as a matter of prefer-
ence, we invert these settings, allowing each latent 
variable’s first factor loading to vary freely, achieved 
through the “NA*” in the code above, while con-
straining the variance of each latent variable to 1, 
achieved through the addition of the code below. By 
overriding this default setting, we can more easily 
constrain the factor loadings to equality in later 
models. 

 
6 This seems unintuitive but makes sense when you con-
sider that lavaan works with variance-covariance matri-
ces, which places variances on the diagonal. Thus, 

Note that in the code above, we use the ~~ oper-
ator to correlate PSP.7 with itself, which effectively 
defines its variance6, which we set to 1 with the op-
erator 1*. Though not shown in the code snippet 
above, we also constrained our residual covariances 
to equality across waves, as justified in the previous 
section. We achieved this by assigning unique labels 
(e.g., “psp1cov”) to the defined covariance parame-
ters of each residual across all days in the study, for 
example: 

# Here “psp1cov” is a label, while “psp1.7” and 
“psp1.8” are variables. 
psp1.7 ~~ psp1cov*psp1.8 

“covarying something with itself” refers to the cell on the 
diagonal, or the variance for that variable.  


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
7 

The final part of the code using “paste” merely 
combines two vectors of characters together (i.e., 
the error structure and the configural model). Be-
cause the correlated error structure is 108 lines of 
code, it would be tedious to re-write this every time. 
Thus, we instead save the error structure as an ob-
ject named “errorstructure” and append it to each 
model to make the code shorter.  

Having defined our model, we used it to run our 
first CFA using the code below: 

 
configural.fit <- cfa(configural.model, 
                            data = model.test.dat, 
                            estimator = "MLR", 
                            se = "robust", 
                            missing = "ML", 
                            std.lv = TRUE) 

 
Note that for this and all future analyses, we added 
the line “std.lv = TRUE.” This automatically fixes the 
variance of factors to 1, rather than their factor load-
ings, which is the default setting in lavaan. Because 
we will be predicting good model fits after imposing 
equality of factor loadings in our testing of measure-
ment invariance, it is better to fix factor variance to 
1 in this manner. As for the other code statements: 
“data = model.test.dat” calls our dataset for use in 
the model; “estimator = ‘MLR’” sets our model’s esti-
mation method to maximum likelihood estimation 
with robust standard errors; “se = ‘robust’” similarly 
implements robust standard errors in the estima-
tion; “missing = ‘ML’” implements full information 
maximum likelihood estimation for missing data. 
Standard practice for comparing nested models dic-
tates that investigators establish increasingly strict 
levels of invariance, stopping once a model fails to 
display adequate fit criteria. In our Results section, 
we discuss our model fit indices in more detail. 
Within this section, however, we will simply 
acknowledge whether fit criteria were met and 
move on to test all 4 levels of invariance for peda-
gogical purposes. 
 

Metric Model. Having found evidence for config-
ural invariance in our latent variables, we moved on 
to check for metric invariance. In this model, in ad-
dition to the constraints applied to our configural 
model, we constrained factor loadings to equality 
across waves. We achieved this by assigning unique 

labels (e.g., “psp1f*”) to all five iterations of each fac-
tor (e.g., “psp1.7”, “psp1.8”, etc.) within our factor 
loading definition section, for example: 

 
# Here “psp1f”, “psp2f”, and “psp3f” are labels, while 
the terms to the right of these labels are variables.  
PSP.7 =~ psp1f*psp1.7 + psp2f*psp2.7 + psp3f*psp3.7 
PSP.8 =~ psp1f*psp1.8 + psp2f*psp2.8 + psp3f*psp3.8 

 
In the code snippet above, you might read it as: “PSP 
items psp1.7, psp2.7 and psp3.7 (i.e., items from day 
7) all load on the latent factor called PSP.7. PSP items 
psp1.8, psp2.8 and psp3.8 (i.e., items from day 8) all 
load on the latent factor called PSP.8. The factor 
loadings for psp item 1 at both day 7 and day8 is given 
the label ‘psp1f’. Since they have the same label, they 
are constrained to be equal to each other.” 

 
MACKINNON ET AL. 

 
8 

Figure 3 
Metric Model (2 days only).  

Note. In our metric model, we fixed factor loadings to equality across waves, as shown by letters A – J (and matching 
colors) in this simplified diagram. Factor loadings that share the same letter and color are constrained to equality. The 
model tested in the tutorial uses 5 days of data; to reduce visual clutter, this diagram shows only the first two days of data.

 
Scalar Model. We then moved on to check for 

scalar invariance. In this model, in addition to the 
constraints applied to our metric model, we fixed 
item intercepts to equivalence across waves. We 
achieved this by applying a single label (e.g., psp1i) to 
all five iterations of an item’s intercept. For example: 

 
# Here psp1i is a label, while the 1 refers to the in-
tercept of the variable on the left side of the ~. 
psp1.7 ~ psp1i*1 
psp1.8 ~ psp1i*1 

 
A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
9 

Figure 4 
Scalar Model (2 days only).  

Note. In our scalar model, we additionally fixed item intercepts to equality across waves, as shown by letters K-T (and 
matching colors) in this diagram. Intercepts that share a letter and color are constrained to equality. The model tested in 
the tutorial uses 5 days of data; to reduce visual clutter, this diagram shows only the first two days of data.

 
Residual Model. Finally, we applied our most rig-

orous set of constraints in our residual model. In this 
model, in addition to the constraints applied to our 
scalar model, we constrained the residual error of 
each factor to equality across waves. Once again, we 
did this by applying the same label (e.g., “psp1u”) to 
each iteration of a given residual across waves. For 
example: 

 
# Here “psp1u” is a label, while “psp1.8” is a variable. 
psp1.7 ~~ psp1u*psp1.7 
psp1.8 ~~ psp1u*psp1.8 

 
Model Selection. Having thus defined our nested 

models, we next compare all of their fit indices at 
once and select the best candidate for our subse-
quent structural equation model (SEM). We grouped 

the models and compared their fit indices using the 
code below: 

round(cbind(configural.error=inspect(configural.fit, 
'fit.measures'), 
            metric=inspect(metric.fit, 'fit.measures'), 
           scalar=inspect(scalar.fit, 'fit.measures'), 
            residual=inspect(residual.fit,    
'fit.measures')),3) 
 
anova(configural.fit, metric.fit) 
anova(metric.fit, scalar.fit) 
anova(scalar.fit, residual.fit) 

 
Here, the inspect() function extracts the fit indices 
from the specified models, the cbind() function 
places each extracted item into a dataframe, and the 
round(…,3) function rounds all values to three deci-


MACKINNON ET AL. 

 
10 

mal places. The anova() functions are a shorter ver-
sion to get key statistics: AIC, BIC, and a chi-squared 
difference test. We discuss the fit indices and selec-
tion process in the next section. After selecting our 
residual model as our preferred choice, we set up 
and ran our SEM. In addition to the constraints of 
our residual model, our SEM defined regression 
paths between our latent variables, using simple la-
bels to fix them across days (see Figure 6).  
 

SSA.8 ~ A*SSA.7 + C*PSP.7 
PSP.8 ~ D*SSA.7 + B*PSP.7 

 
Figure 5   
Residual Model (2 days only).  

 
Note. In our residual model, we additionally fixed residual variances to equality across waves, as shown by letters U-DD 
(and matching colors) in this diagram. Residual variances that share a letter and color are constrained to equality. The 
model tested in the tutorial uses 5 days of data; to reduce visual clutter, this diagram shows only the first two days of data.


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
11 

 
Results 

Fit Indices 

The fit indices7 for each of our four models (con-
figural, metric, scalar, and residual) are shown in Ta-
ble 1. Number of estimated parameters is self-ex-
planatory. For the sake of model parsimony, we 
sought to estimate the fewest possible number of 
parameters, while still using a model that maintains 
a good fit for our data. 

The Comparative Fit Index (CFI) and Tucker 
Lewis Index (TLI) compare the given proposed 
model to the null model. Both prioritize the average 
correlation of variables within the proposed model, 
and their results are highly correlated. However, TLI 
places an additional emphasis on model parsimony, 
implementing a modest penalty for each additional 
estimated parameter. These fit indices are nega-
tively biased, meaning that the resultant value for 
poor-fitting models tends to be lower. For both CFI 
and TLI, values between .90 and .95 are considered 
marginally acceptable, whereas values above .95 are 
considered good. Cheung & Rensvold (2002) suggest 
that a ΔCFI of -.01 or more suggests that the model 
with the largest CFI should be chosen; otherwise, 
prefer the more parsimonious model. Using this cri-
terion, the residual model would be preferred.  

On the other hand, two popular positively biased 
(meaning the lower the value, the better the fit) 
measures of fit are the Root Mean Square Error of 
Approximation (RMSEA) and the Standardized Root 
Mean Square Residual (SRMR). The RMSEA applies a 
penalty for model complexity according to the ratio 
between the chi-square and degrees of freedom. 
Models with fewer degrees of freedom are penalized 
more strongly, and values of .06 to .08 or lower gen-
erally indicate an acceptable level of fit. SRMR, 
meanwhile, calculates the difference between the 
observed correlation and predicted correlation, 
with no penalty for model complexity. As with 
RMSEA, SRMR values should be no greater than .08. 
To our knowledge, these are not commonly used for 
nested model comparisons, though it is important 
than any final selected model still fit well by this cri-
terion. The Akaike information criterion (AIC) and 

 
7 This paper does not seek to fully explain the mathemat-
ics behind each fit index, but rather to give a brief over-

Bayesian information criterion (BIC) are compara-
tive fit measures. They provide an independent 
value for assessing each model’s fit, which makes 
most sense when compared between models. The 
lower the value, the better the fit, with differences 
of 6 or greater typically constituting strong evidence 
of model difference (Raftery, 1995). Both use a likeli-
hood function in conjunction with the number of es-
timated parameters and the sample size to assess 
model fit, with BIC implementing a slightly higher 
penalty for more complex models. Due to this dif-
ference, AIC values preferred our scalar model, 
whereas BIC values preferred our residual model in 
Table 1. 

Log-likelihood ratio tests (sometimes called “chi-
squared difference tests”) are also a popular way to 
compare models, though they tend to be overly sen-
sitive in the same ways the chi-squared test for 
model fit is. The log-likelihood ratio test compares 
the loglikelihood of two models and produces a test 
statistic that has a χ2 distribution when the null hy-
pothesis is true. A statistically significant p-value in-
dicates that the less parsimonious model should be 
chosen. A non-significant p-value suggests that we 
should default to the most parsimonious model. In 
Table 1, the loglikelihood ratio test prefers the met-
ric model.   

Model Comparison 

As shown in Table 1, the fit indices for each of our 
models were generally favorable, but due to the dif-
ferences in index calculations just described, differ-
ent indices preferred different models. In case of 
such potential conflicts, we therefore recommend 
the researcher decide which model aspect they 
deem most important (e.g., goodness of fit, number 
of estimated parameters, sample size), and decide a 
priori which fit index to use to determine their final 
model. For a more in-depth (but challenging) review 
of these issues, see Lin, Huang, & Weng (2017).  We 
tend to prefer ΔBIC as an a-priori criterion, but 
there is a great deal of variability among different 
analysts on this point. Failing this, you might choose 
the model recommended most often out of the 4 
choices (loglikelihood ratio, AIC, ΔBIC, ΔCFI). While 
we are not formally hypothesis testing in this paper, 
we chose to continue with the model with the best 

view of the features prioritized by each. For a more sub-
stantive description of fit indices, and for a list of further 
reading, please see Kenny (2015). 


MACKINNON ET AL. 

 
12 

BIC score (the residual model).  Given that it was not 
significantly worse than our scalar model by most 
common metrics, our residual represented the best 
trade-off between model fit and model complexity. 
Factor loadings, unstandardized intercepts, and un-
standardized residual variances for each model are 

provided in the Appendix, within Tables A1, A2, and 
A3, respectively.  

 
Table 1  
Nested Model Fit Indices 

 Model 
 Configural Metric Scalar Residual 

No. of Estimated Parameters 205 165 125 85 

Raw Loglikelihood -12632 -12651 -12685 -12741 

Δ χ2, p-value N/A 54.829, .059 67.585, .004 67.151, .005 

Robust CFI 0.953 0.952 0.950 0.946 

Raw AIC 25674 25633 25619 25652 

Δ AIC N/A -41 -14 +33 

Raw BIC 26397 26215 26060 25951 

Δ BIC N/A -182 -155 -109 

Robust RMSEA 0.048 0.048 0.048 0.049 

SRMR 0.054 0.058 0.059 0.059 
Note. Bold font indicates preferred model/s according to index. Chi-square difference test (i.e., log likelihood ratio tests) 
values are calculated between the current column model and the preceding column model – e.g., the value listed for the 
metric model column compares the metric model to the configural model. AIC and BIC are calculated as the difference 
from the preceding model – e.g., the residual model AIC is 33 greater than the scalar model AIC, while the scalar model is 
14 less than metric AIC. 

 
Structural Model for Traditional CLPM 

Using the constraints defined in our residual 
model, we conclude by testing the structural model. 
In this model, we estimate regressions instead of co-
variances. We also include only relationships from 
time t to time t+1. That is, relationships from time t 
to t+2, t+3 and so on are omitted. One may wish to 
include these as well, but for pedagogical purposes 
here we will focus only on these pathways and will 
constrain relationships between variables at time 
lags greater than +1 to be zero. This model fit the 
data ok, but the model fit is worse relative to the CFA 
model, χ2(1276) = 2139, p < .001, Robust CFI = 0.93, 
Robust TLI = 0.93, RMSEA = 0.05, SRMR = 0.23. No-
tably, the SRMR index suggests considerably worse 
fit than the other measures here. This probably 
means we should investigate whether more paths 
should be added (e.g., the cross-lagged paths from 
time t to time t+1). For brevity’s sake in this tutorial, 

we do not explore this further (as this is an idiosyn-
cratic feature of this sample dataset, rather than the 
general approach most people would be taking), but 
it is worth mentioning – it probably means that the 
relationships of interest are not confined to a 1-day 
time lag, and that they can influence relationships 
on a longer time frame. This model had R2 values 
ranging from .38 to .63. Figure 6 shows our latent 
variables across the five days, and 95% confidence 
intervals for the path values. As shown in the figure, 
previous-day perfectionistic self-presentation 
strongly predicted next-day perfectionistic self-
presentation. Similarly, previous-day social anxiety 
strongly predicted next-day social anxiety.   Mean-
while, each latent variable’s cross-lagged path pre-
dicting the opposite variable was weaker but still 
non-zero and statistically significant. This kind of 
pattern (large autoregressive paths, small cross-
lagged paths) is typical of most cross-lagged panel 
models with adequate statistical power. 


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
13 

  
Figure 6 
Structural Model for the Traditional Cross-lagged Panel Model.  

Note. Day-to-day paths were fixed to equality. All paths and covariances are shown as unstandardized 95% confidence 
intervals. Much is omitted from this diagram, including the factor loadings and correlated error structure that is depicted 
in Figures 2-5.  All pathways are statistically significant at p < .05.

Random Intercepts Cross-Lagged Panel Model 

Though the above summarizes the traditional 
cross-lagged panel model approach, we will also 
briefly review the multiple indicator random inter-
cepts cross-lagged panel model described by 
Mulder & Hamaker (2020). This model makes four 
primary changes: (a) the correlated residual struc-
ture is omitted; (b) random intercepts are specified 
for each indicator; (c) the random intercepts are al-
lowed to covary together; (d) the latent means for all 
latent variables from the second time point onwards 
are freed8. See Figure 7 for a visual depiction of this 
model. Please note that the RI-CLPM can only be es-
timated with 3 or more time points; the diagram in 
Figure 7 is for pedagogical purposes and presents 
only two time points due to the practical require-
ments of creating an image with a font size large 
enough to read! See the online supplementary ma-
terials for the statistical syntax of the full model. 

Excepting the chi-squared statistic and SRMR, 
this model fit the data reasonably well, χ2(1225) = 
1884, p < .001, Robust CFI = 0.94, Robust TLI = 0.94, 
RMSEA = 0.05, SRMR = 0.16. Figure 8 shows the 95% 
CIs for the path coefficients of interest. Here, we 
have a different pattern of results. The auto-regres-
sive paths now represent within-person carry-over 
effects (Mulder & Hamaker, 2020). For perfection-
istic self-presentation, when people experience ele-
vated perfectionism scores relative to their own ex-
pected score, they are more likely to experience el-
evated perfectionism scores relative to their own 
expected score at the next occasion as well. In com-
parison, within-person carry-over effects were in-
conclusive (i.e., non-significant) for social anxiety. 
Moreover, the cross-lagged paths now suggest that 
perfectionistic self-presentation predicts increases 
in social anxiety over time, but not the reverse. In-
terestingly, this result is more in line with theoreti-
cal predictions in this research area.  

 
8 Mulder & Hamaker (2020) note “…if we would not freely 
estimate the latent means, we would not only specify 

strong factorial invariance, but also specify a model in 
which there cannot be mean changes over time” (pg. 8).  


MACKINNON ET AL. 

 
14 

Figure 7  
The Random-Intercepts Cross-Lagged Panel Model (2 days only).  

Note. This model adds random intercepts (RI1-RI10; red), covariances between the random intercepts (orange), and frees 
the latent means from the second time point onwards (i21 and i22; blue). Please note that the RI-CLPM can only be esti-
mated with 3 or more time points; this diagram is for pedagogical purposes and presents only two time points to ensure 
the font size is still legible. 

 
A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
15 

Figure 8 
Coefficients of Interest for the Random Intercepts Cross-Lagged Panel Model.  

Note. Day-to-day paths were fixed to equality. All paths and covariances are shown as unstandardized 95% confidence 
intervals. Much is omitted from this diagram, including the factor loadings and random intercepts that are depicted in 
Figure 7.  Black solid lines are statistically significant at p < .05.  Grey dotted lines are non-significant p > .05. 

 
Discussion 

To re-cap, we used confirmatory factor analysis 
to test two related variables for measurement invar-
iance across five waves, using four increasingly re-
strictive nested models: configural, metric, scalar, 
and residual. After selecting our residual model as 
the simplest that maintained good fit, we applied its 
constraints to a structural equation model, allowing 
us to quantify the cross-lagged relationships of our 
two variables across days. 

Interestingly, not all our fit indices preferred the 
same model. The loglikelihood ratio tests preferred 
the metric model, as this test tends to prefer less 
parsimonious models.  Meanwhile, our CFI preferred 
the residual model based on Cheung & Rensvold’s 
(2002) criteria. AIC preferred the scalar model, due 
to its higher parsimony coupled with good overall 
fit. On the other hand, BIC, our final deciding crite-
rion, placed a higher emphasis on parsimony, and 
therefore preferred our residual model. There are 
two main take-away points here. First, the relatively 
good fits of all of our models across indices suggest 
that our theoretical reasons for investigating this 
relationship between variables were most likely 
sound. Second, while researchers may wish to re-
port multiple fit indices in their papers, it is im-
portant that they decide beforehand which index 

they will use when determining their final model. It 
is worth noting that the structural models had poor 
fit based on the SRMR index, which is probably due 
to constraining some paths to zero that still evince 
a positive relationship. If this were an a-priori crite-
rion used for assessing model fit in a research paper, 
you would need to investigate the source of this 
misfit further. Remember, SRMR differs from RMSEA 
insomuch as it has no penalty for model complexity; 
to the extent that you value model parsimony in 
model selection, you might prefer to use RMSEA as 
your selection tool instead.  

In our case, the final model conformed to the 
strictest level of measurement invariance. However, 
in some studies, this will not be the case. Scalar in-
variance is typically sufficient for general data anal-
ysis, as it indicates that participants do not vary 
greatly across waves in the ways they interpret and 
answer questions. However, should only metric in-
variance be upheld, researchers must qualify any 
subsequent results by acknowledging that, although 
the latent factors are loaded similarly across waves, 
individual interpretations of the items may change 
over time. For more examples of this, see Steenkamp 
& Baumgartner (1998). It is also worth noting that 
this method for testing MI in CLPMs works best on 
models with a low to moderate number of waves. For 
a 20-day diary study, it may be more pragmatic for 


MACKINNON ET AL. 

 
16 

researchers to instead implement multi-level mod-
elling techniques or multilevel structural equation 
modelling.  

Though the present tutorial dataset is ill-suited 
for examining differences in latent means, it is worth 
noting that scalar invariance is often a preliminary 
step towards examining differences between means. 
That is, most substantive research questions on lon-
gitudinal data are not about the measurement invar-
iance per se, but rather about regression/covari-
ance and mean differences over time. Though longi-
tudinal latent mean differences are beyond the 
scope of this tutorial, readers interested in learning 
more might read Bishop, Christian & Cole (2015) for 
three approaches for modelling latent growth 
curves with multiple indicators. Moreover, Breitsohl 
(2019) is an excellent tutorial for converting com-
mon experimental designs (e.g., ANOVA) to SEM 
frameworks. 

Author Contact  

Correspondence concerning this article should 
be addressed to Sean P. Mackinnon. Email: mackin-
non.sean@dal.ca. 

http://orcid.org/0000-0003-0921-9589 

Conflict of Interest and Funding 

The authors have no conflict of interest to de-
clare. The data used for this tutorial was collected 
with the help of a Social Sciences and Humanities 
Research Council Insight Development Grant 
[#430-2016-00805]. 

Author Contributions 

Authors are in order of most to least contribu-
tion. 

Sean Mackinnon took the lead role in conceptu-
alizing the tutorial, supervised Robin’s work as part 
of his Ph.D. comprehensives, edited the first draft 
manuscript and code, wrote original sections, cre-
ated the figures, analyzed the data with the Random 
Intercepts CLPM.  

Robin Curtis created the majority of the R Syntax 
and online appendices for our OSF page, excluding 
the Random Intercepts model. He also took a lead 
role in writing the first draft of the manuscript.  

Roisin O’Connor assisted with editing the writing 
in the manuscript and editing/reviewing the tuto-
rial materials.  

Open Science Practices 

   
This article earned the Open Materials badge for 

making the materials openly available. It is a tutorial 
that used data from a published study, and as such 
has no (new) collected data. It was not pre-regis-
tered. It has been verified that the analysis repro-
duced the results presented in the article. The entire 
editorial process, including the open reviews, is 
published in the online supplement. 

References 

Bishop, J., Geiser, C., & Cole, D. A. (2015). Modeling 
latent growth with multiple indicators: A 
comparison of three approaches. Psycho-
logical Methods, 20(1), 43-62. 
 https://doi.org/10.1037/met0000018 

Breitsohl, H. (2019). Beyond ANOVA: An introduction 
to structural equation models for experi-
mental designs. Organizational Research 
Methods, 22(3), 649-677.  
https://doi.org/10.1177/1094428118754988 

Byrne, B. M., Shavelson, R. J., & Muthen, B. O. (1989). 
Testing for equivalence of factor covariance 
and mean structures: The issue of partial 
measurement invariance. Psychological Bul-
letin, 105(3), 456-466.  
https://doi.org/10.1037/0033-
2909.105.3.456 

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating 
goodness-of-fit indexes for testing meas-
urement invariance. Structural Equation 
Modeling, 9(2), 233-255.  
https://doi.org/10.1207/S15328007SEM09
02_5 

Cole, D., & Maxwell, S. (2003). Testing mediational 
models with longitudinal data: Questions 
and tips in the use of structural equation 
modeling. Journal of Abnormal Psychol-
ogy, 112(4), 558-577.  


A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
17 

https://doi.org/10.1037/0021-
843X.112.4.558 

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. (2015). 
A critique of the cross-lagged panel model. 
Psychological Methods, 20(1), 102-116. 

  http://dx.doi.org/10.1037/a0038889 
Hewitt, P. L., Flett, G. L., Sherry, S. B., Habke, M., Par-

kin, M., Lam, R. W., ... Stein, M. B. (2003). The 
interpersonal expression of perfection: Per-
fectionistic self-presentation and psycho-
logical distress. Journal of Personality and 
Social Psychology, 84(6), 1303–1325. 

  10.1037/0022-3514.84.6.1303. 
Kashdan, T., & Steger, M. (2006). Expanding the to-

pography of social anxiety: An experience-
sampling assessment of positive emotions, 
positive events, and emotion suppression. 
Psychological Science, 17(2), 120-128. 

  https://doi.org/10.1111/j.1467-
9280.2006.01674.x 

Kehayes, I.-L. L., & Mackinnon, S. P. (2019). Investi-
gating the relationship between perfection-
istic self-presentation and social anxiety us-
ing daily diary methods: A replication. Colla-
bra: Psychology, 5(1), 33.  
https://doi.org/10.1525/collabra.257  

Kenny, D. A. (2015). Measuring Model Fit. 
http://www.davidakenny.net/cm/fit.htm  

Kyriazos, T. (2018) Applied psychometrics: sample 
size and sample power considerations in 
factor analysis (EFA, CFA) and SEM in gen-
eral. Psychology, 9, 2207-2230.  

 https://doi.org/10.4236/psych.2018.98126. 
Lin, L. C., Huang, P. H., & Weng, L. J. (2017). Selecting 

path models in SEM: A comparison of model 
selection criteria. Structural Equation Mod-
eling, 24, 855-869. 
https://doi.org/10.1080/10705511.2017.136
3652 

Mackinnon, S. (2012). Perceived social support and 
academic achievement: Cross-lagged panel 
and bivariate growth curve analyses. Journal 
of Youth and Adolescence, 41(4), 474-485. 
https://doi.org/10.1007/s10964-011-9691-1 

Mackinnon, S. P., Couture, M-E., Cooper, M. L., 
Kuntsche, E., O’Connor, R. M., Stewart, S. H., 
& the DRINC Team.  (2017). Cross-cultural 
comparisons of drinking motives in 10 coun-
tries: Data from the DRINC Project.  Drug 
and Alcohol Review, 36, 721-730. 
https://doi.org/10.1111/dar.12464. 

Mackinnon, S. P., Ray, C. M., Firth, S. M., & O’Connor, 
R. M. (2021). Data from “Perfectionism, Neg-
ative Motives for Drinking, and Alcohol-Re-
lated Problems: A 21-day Diary Study”. Jour-
nal of Open Psychology Data, 9: 1, pp. 1–6. 
doi: https://doi.org/10.5334/jopd.44 

Mulder, J. D., & Hamaker, E. L. (2020). Three exten-
sions of the random intercept cross-lagged 
panel model. Structural Equation Modeling, 
1-11. 
https://doi.org/10.1080/10705511.2020.178
4738 

Raftery, A. (1995). Bayesian model selection in social 
research. Sociological Methodology, 25, 111-
163. https://doi.org/10.2307/271063 

Rice, S. P., Loscalzo, Y., Giannini, M., & Rice, K. G. 
(2020). Perfectionism in Italy and the USA: 
Measurement invariance and implications 
for cross-cultural assessment. European 
Journal of Psychological Assessment, 36, 
207-211.  
https://doi.org/10.1027/1015-
5759/a000476 

Rogosa, D. (1980). A critique of cross-lagged corre-
lation. Psychological Bulletin, 88(2), 245–258. 
https://doi.org/10.1037/0033-
2909.88.2.245 

Sherry, S., & Hall, P. (2009). The perfectionism model 
of binge eating: Tests of an integrative 
model. Journal of Personality and Social Psy-
chology, 96(3), 690-709. 

  https://doi.org/1010.1037/a0014528 
Steenkamp, J., & Baumgartner, H. (1998). Assessing 

measurement invariance in cross-national 
Consumer Research. Journal of Consumer 
Research, 25(1), 78-107. 
https://doi.org/10.1086/209528 

Twenge, J. M., & Campbell, W. K. (2009). The narcis-
sism epidemic: Living in the age of entitle-
ment. New York, NY: Atria. 

Ullman, J. B. (2006). Structural equation modeling: 
Reviewing the basics and moving forward. 
Journal of Personality Assessment, 87(1), 35-
50. 

Van de Schoot, R., Lugtig, P., & Hox, J. (2012). A 
checklist for testing measurement invari-
ance. European Journal of Developmental 
Psychology, 9(4), 486-492. 
https://doi.org/10.1080/17405629.2012.68
6740 


MACKINNON ET AL. 

 
18 

Wetzel, E., Brown, A., Hill, P., Chung, J., Robins, R., & 
Roberts, B. (2017). The narcissism epidemic 
is dead; long live the narcissism epidemic. 
Psychological Science, 28(12), 1833-1847. 
https://doi.org/10.1177/0956797617724208 

Widaman, K., Ferrer, E., & Conger, R. (2010). Factorial 
invariance within longitudinal structural 
equation models: Measuring the same con-
struct across time. Child Development Per-
spectives, 4(1), 10-18. 
https://doi.org/10.1111/j.1750-
8606.2009.00110.x 

Widaman, K. F., & Reise, S. P. (1997). Exploring the 
measurement invariance of psychological 
instruments: Applications in the substance 
use domain. In K. J. Bryant, M. Windle, & S. 
G. West (Eds.), The science of prevention: 
Methodological advances from alcohol and 
substance abuse research (pp. 281–324). 
Washington, DC: American Psychological 
Association. 

 
A TUTORIAL IN LONGITUDINAL MEASUREMENT INVARIANCE AND CROSS-LAGGED PANEL MODELS USING LAVAAN 

 
19 

 
Appendix A: Supplementary Tables 

Table A1  
Factor Loadings 

 Configural Metric Scalar Residual 
psp1 0.84 - 0.88 (1.62 - 1.75) 0.86 (1.68) 0.63 (1.68) 0.86 (1.68) 
psp2 0.93 - 0.96 (1.89 - 2.08) 0.95 (1.99) 0.90 (1.99) 0.95 (1.98) 
psp3 0.89 - 0.92 (1.73 - 1.96) 0.91 (1.87) 0.89 (1.87) 0.91 (1.86) 
ssa1 0.85 - 0.90 (1.01 - 1.06) 0.87 (1.04) 0.87 (1.04) 0.87 (1.04) 
ssa2 0.87 - 0.92 (1.11 - 1.18) 0.90 (1.14) 0.90 (1.14) 0.90 (1.14) 
ssa3 0.89 - 0.93 (1.11 - 1.21) 0.91 (1.16) 0.90 (1.15) 0.90 (1.15) 
ssa4 0.85 - 0.94 (1.03 - 1.12) 0.90 (1.13) 0.90 (1.13) 0.90 (1.13) 
ssa5 0.85 - 0.91 (1.06 - 1.15) 0.88 (1.12) 0.88 (1.12) 0.88 (1.11) 
ssa6 0.72 - 0.81 (0.87 - 0.99) 0.75 (0.93) 0.75 (0.93) 0.75 (0.92) 
ssa7 0.60 - 0.67 (0.64 - 0.75) 0.62 (0.68) 0.62 (0.68) 0.63 (0.68) 

Note. Values are formatted as “standardized (unstandardized).” Value ranges (min-max) are provided for the configural 
model because factor loadings varied by day. For the latter three models, factor loadings were constrained to equality 
across days. However, for the metric and scalar models, standardized factor loading scores still fluctuated very slightly 
across days, because variances differed across days. This will be typical in most real data. Therefore, arithmetic mean 
values are provided for standardized scores in the metric and scalar models, just for ease of presentation in the table. In 
the residual model, residual error was also constrained to equality across days, which resulted in no fluctuation of stand-
ardized scores. 
 

Table A2 
Unstandardized Intercepts 

 Configural Metric Scalar Residual 
psp1 3.63 - 3.94 3.63 - 3.94 3.75 3.76 
psp2 3.49 - 3.74 3.49 - 3.75 3.64 3.64 
psp3 3.28 - 3.66 3.28 - 3.66 3.47 3.48 
ssa1 1.53 - 1.64 1.53 - 1.64 1.60 1.61 
ssa2 1.39 - 1.56 1.39 - 1.56 1.52 1.52 
ssa3 1.31 - 1.53 1.31 - 1.53 1.44 1.44 
ssa4 1.40 - 1.64 1.40 - 1.64 1.54 1.54 
ssa5 1.31 - 1.47 1.32 - 1.47 1.41 1.41 
ssa6 1.09 - 1.23 1.09 - 1.23 1.16 1.16 
ssa7 0.97 - 1.10 0.97 - 1.10 1.01 1.02 

Note. Ranges are provided for the configural and metric models, for which intercepts varied by day. For the latter two 
models, intercepts were fixed across days. 
 
  
MACKINNON ET AL. 

 
20 

 
Table A3 
Unstandardized Residual Variances. 
 

 Model 
 Configural Metric Scalar Residual 
psp1 0.89 - 1.11 0.87 - 1.11 0.89 - 1.12 1.00 
psp2 0.34 - 0.57 0.33 - 0.57 0.34 - 0.57 0.44 
psp3 0.67 - 0.80 0.68 - 0.82 0.67 - 0.80 0.74 
ssa1 0.26 - 0.38 0.26 - 0.38 0.26 - 0.38 0.33 
ssa2 0.23 - 0.40 0.23 - 0.40 0.23 - 0.40 0.31 
ssa3 0.22 - 0.35 0.22 - 0.35 0.22 - 0.36 0.30 
ssa4 0.18 - 0.41 0.18 - 0.41 0.18 - 0.41 0.30 
ssa5 0.27 - 0.44 0.26 - 0.44 0.27 - 0.44 0.37 
ssa6 0.54 - 0.76 0.53 - 0.76 0.54 - 0.76 0.66 
ssa7 0.68 - 0.77 0.68 - 0.76 0.68 - 0.77 0.73 

Note. In the residual model, residual error was constrained to equality across days. For all other models, the range of 
values across the five days is provided.