Apologies for cross-posting. The query below could be answerable by
people on the multilevel list and/or by people on the general stat list.
I sent to both, and apologize if you receive this message twice.
I have data on a probability sample of U.S. citizens (11.7 million of
cases) and on the states in which they reside (50 cases). The data are
cross-sectional.
I am estimating multi-level models of the following form (as an example):
Y_ij = b1 BM + b2 WM +b3 BF + b4 WF + b5 YrsEd + b6 Age + b7 Age^2 + e_ij
b1 = g01 + g11 Z1 + g21 Z2 + d1
b2 = g02 + g12 Z1 + g22 Z2 + d2
b3 = g03 + g13 Z1 + g23 Z2 + d3
b4 = g04 + g14 Z1 + g24 Z2 + d4
b5 = g05
b6 = g06
b7 = g07
e ~ N(0,sigma^2)
d_k ~ N(0, tau^2) (tau a variance/covariance matrix)
cov (d_k, d_k+1 = appropriate element of the tau matrix)
The substantive aim of this research is to assess g11, g21, g12, g22, g13,
g23, g14, and g24. All other parameters are not of interest.
The problem? The individuals are a sample. But I am not interested in
individual-level inference, and ignore those standard errors in my
discussion (everything is statistically significant with 11.7 million
cases anyway). My inferences focus on the states. The states are a 100%
sample. But there are only 50 states, so the standard errors are fairly
large. My interest is in shrinking the standard errors of the g1* and g2*
parameters to reflect that the analysis is based on a 100% sample of the
macro-level population. I know one can assume some kind of
super-population and not adjust the standard errors, but *my* aim is to
say what happened at the particular time of my sample, not to make an
inference to some non-existent conceptual population of states, nor to
other times and places.
As I see it, there are possibly three options:
1)No adjustment -- the standard errors will just be off, and I'll have to
accept that. This will make it difficult to discern real association from
noise. And, given that with only 50 states the standard errors are large,
I will end up concluding Z1 and Z2 don't matter far too often.
2)Use the finite population correction (fpc). But, because the fpc for a
100% sample makes the standard error equal 0 (in my calculation), it seems
too severe an adjustment. Using the fpc in this way would ignore that
there *is* some sample variation owing to the individual-level estimation
being based on a sample. Thus, the standard errors will again be off,
this time in the opposite direction from the first fix above.
I am in the middling case--my inferences focus on states at a particular
time, but there is an individual-level sample involved, which would seem
to require some kind of standard error. *But*, the standard error
calculated under the assumption of only 50 cases, as if those 50 cases are
not the population, appears seriously over-estimated. This leads to
option 3.
3)Decompose the program's standard error for g1* and g2* parameters into
two components: 1)the part attributable to the individual-level sample
estimation and 2)the part attributable to the 50 case state sample. If so,
I could use a spreadsheet (or maybe even the back of an envelope) to
recalculate the standard error setting the second part to zero (to account
for the 100% state sample). But, I do not know how to do this
decomposition. Ideally, all one would need is output from something like
HLM.
I am seeking assistance. Is it possible to decompose the std error (i.e.,
the variance of g1* and g2*)? Can it be done with output from the model?
Are there other solutions to this dilemma? I appreciate any assistance
anyone can provide.
Thanks.
Sam
|