Dear Eric and others,
thank you, Eric, for your elucidating comments on sample size issues
using matrix notations. I am currently puzzled about a similar question
and I thought you might be right person to ask. But I am glad if other
jump in, too.
It seems that your elaborations are only valid if the design matrix is
not rank-deficient, that is, rank(G) = size(G,2). This is true for the
design discussed previously (one-way ANOVA without constant). The
inverse (G'G)^(-1) is only defined in this case.
However, SPM often uses rank-deficient design matrices at the second
level (i.e., within-subjects ANOVA). Thus, the betas can only be
estimates using the pseudo-inverse [B=pinv(G)Y], and not the "real"
inverse (G'G)^(-1)G'Y. As a consequence, the subject constant IS NOT the
mean of the data of the subject, which I had previously thought it is.
Let's consider a very simple within-subject design (3 conditions,
3 subjects):
Y = G * B + E
-------------------------------
Y1 1 0 0 1 0 0 e1
Y2 0 1 0 1 0 0 e2
Y3 0 0 1 1 0 0 b1 e3
Y4 1 0 0 0 1 0 b2 e4
Y5 = 0 1 0 0 1 0 * b3 + e5
Y6 0 0 1 0 1 0 b4 e6
Y7 1 0 0 0 0 1 b5 e7
Y8 0 1 0 0 0 1 b6 e8
Y9 0 0 1 0 0 1 e9
solved by B = pinv(G)*Y.
As it turns out, mean(Y1:Y3) ~= B(4). Rather,
mean(Y1:Y3) = mean(C*B), where
[1 0 0 1 0 0
C = 0 1 0 1 0 0
0 0 1 1 0 0];
Now, my question is: What is the meaning of parameter estimate for the
subject constant? Is there another (intuitive) interpretation for this
beta, since it is NOT the subject's mean.
Thanks you for your insights.
Jan
On 2005-07-01 (Fri) at 17:45:32 -0400, Eric Zarahn <[log in to unmask]>
wrote:
> George,
>
> Yes, you are correct. Let's go through the math more formally, so you
> can see why your idea is generally true. I am going to use some matrix
> notation (matrices in bold, single quote means transpose, scalars in
> italics), but you can prove the same thing with scalar notation (which
> just takes more work to type out).
>
> For n total observations, a k-condition, dummy coded design matrix
> G has n rows and k columns. Each row has a value of 1 in one and only
> one column. Let nj (j = 1,k) be the number of observations in
> column/group j (and so the sum of the nj over all j = n) The general
> formula for b, the vector of k regression coefficients, is
> b = (G'G)^(-1)G'Y, where Y is the n X 1 vector of observations. Now
> for the specified type of design matrix, G'G is simply
> a k x k diagonal matrix whose jth element is nj . Thus, its inverse
> (G'G)^(-1) is simply a k x k diagonal matrix whose jth element is
> (1/nj). Again, for this type of design matrix G'Y is a k x 1 vector
> whose jth element is the sum of all the observations corresponding to
> group j.
>
> So,the jth element of b = (the sum of all the observations
> corresponding to group j)*(1/nj) = the sample mean for group j.
>
> We can identify your "behind the scenes" matrix from Raj's question
>
> > TaskA TaskB Rest
> > 1/3 0 0
> > 1/3 0 0
> > 1/3 0 0
> > 0 0 1/4
> > 0 0 1/4
> > 0 0 1/4
> > 0 0 1/4
> > 0 1/2 0
> > 0 1/2 0
>
> with ((G'G)^(-1)G')'. Since ((G'G)^(-1)G')G = a k x k identity matrix,
> (G'G)^(-1)G' is known as the pseudo-inverse of G. So your "behind the
> scenes" design matrix is the transpose of the pseudo-inverse of G,
> and is involved in computation of b. Congratulations on the epiphany!
>
> Eric
>
>
>
--
Jan Gläscher NeuroImage Nord
+49-40-42803-7890 (office) Dept. of Systems Neuroscience, Bldg S10
+49-40-42803-9955 (fax) University Medical Center Hamburg-Eppendorf
[log in to unmask] Martinistr. 52
20246 Hamburg
Germany
http://www.uke.uni-hamburg.de/kliniken/neurologie/index_16969.php
-------------------------------------------------------------------------
GnuPG/PGP key id: FEC4B55C
fingerprint: 5A36 1EF6 8472 117E 805A F240 3146 A410 FEC4 B55C
-------------------------------------------------------------------------
|