Hi everyone

As many of you may have experienced with your multilevel (hierarchical or clustered) data, group means are often highly correlated with the amount of dispersion or 'agreement' within a group. In multilevel models this will be found as large correlations among random intercepts and random within-group variances at the Between-group level of analysis (i.e., a large location-scale correlation at the group level of analysis). This is a common finding and has been discussed extensively in multiple literatures, including in the longitudinal data case, where researchers have proposed various largely ad hoc remedies for the problem (e.g., Mestdagh et al., 2018).

In case you're interested, there is a solution to this problem in the multilevel SEM framework -- thanks to the Mplus team for pointing this out -- which I describe in this Instats blog post here. The problem of these large correlations is that with bounded survey response distributions, the group mean can become collinear with the within-group variance. As the scores within a group tend toward the lower or upper boundary of a response distribution (e.g., tending toward 1 or 5 on a 5-point Likert scale), the variance within the group goes down by design -- the group members' scores become more similar as they get compressed against the boundary of the response options. The net result is that the group mean and the within-group variance (the location-scale parameters) have a strong positive or negative correlation, depending on the direction of the boundary where the scores tend to be massed (lower or upper boundaries, respectively).

As this Mplus output shows, the problem can be eliminated by properly treating the data as ordinal. I was initially trying to use a censored (Tobit) model or a two-part model to address the problem, but much simpler and in this case better is the attached ordinal response approach -- similar to a 2PL polytomous IRT model with a probit link and Bayes estimator. This specification appropriately reflects the categorical and bounded nature of the observed data.

Setting the model up this way requires having enough scale items to work with the latent factor at the within-group level rather than using a scale mean to define the within-group variance as a random B-level variable, but this should always be possible when the problem of large B-group location-scale correlations is caused by categorical item-level data. The resulting correlations among the random latent within-group variance and the latent factor at the B level are striking when compared to their continuous counterparts:

Continuous version of B-level correlations (output file continuous.out):
COR(TSAT_B, SATV) = .858
COR(TWK_B, TWKV) = .768

Categorical version of B-level correlations (output file categorical.out):
COR(TSAT_B, SATV) = -.099
COR(TWK_B, TWKV) = -.132

The net result is that, unlike in the continuous case, with the proper categorical data specification you would now be able to treat the group means and the within-group variances as distinct B-level constructs, and you can look at latent interactions among the B-level group means and random variances to meaningfully evaluate substantive hypotheses about location-scale interactions at the group level (keeping in mind for interpretation that the random variance is actually a logged version of this variable). Although some researchers have recognized this possibility, they have raised concerns such as estimation difficulties (Mestdagh et al., 2018: p. 694), but Mplus's Bayes estimator renders this concern irrelevant. Because of the viability of these models now, including with latent interactions and group means and random within-group variances, additional research can be done to illustrate these models and offer further advice on their applications.

If you or your colleagues/PhD students would like to learn more about these kinds of models and how to estimate them, we still have a few places left in my upcoming 3-day seminar on Multilevel SEM in Mplus: Location-Scale Models running March1-3. Hope to see you there!

Best wishes and find the data and Mplus files here in case you want to have a play

Michael Zyphur

Director

Institute for Statistical and Data Science

instats.org