| I am looking for any references that tell of ways to do principal component
| analysis on large data sets by breaking the large data set into smaller
| pieces.
With PCA, you generally want to decompose your data matrix using singular
value decomposition, such that A = U*S*V', where S is diagonal and U and
V are unitary. If the height of A is greater than the width, then
this is done in Matlab by:
[U,S,V]=svd(A,0);
However, with very large datasets, it is not possible to store everything
in memory, so things need to be done a bit at a time. If a dataset consists
of n volumes, each containing m voxels, then it can be represented by the
m*n matrix A. Assuming that it is possible to store an n*n matrix in memory,
then matrices S and V can easily be derived from the matrix generated by
A'*A;
C = A'*A;
[tmp,D,V] = svd(C);
% or [V,D] = eig(C), but the eigenvalues are not sorted by
% decreasing magnitude if you do it this way.
S = diag(sqrt(diag(D))); % S = sqrtm(D);
The matrix C (above) can be computed a bit at a time (eg voxel by voxel
or plane by plane) using a scheme something like:
C = zeros(size(A,2));
for i=1:size(A,1),
C = C + A(i,:)'*A(i,:);
end;
If you want the matrix U (referred to in SPM as the eigenimages), then this
can be computed from A, V and S by:
U = A*V*diag(diag(S).^(-1));
If you just want one eigenimage at a time (one column of U), then these can
be generated by:
U = zeros(size(A));
for j=1:size(A,2),
U(:,j) = A*V(:,j)/S(j,j);
end;
or:
U = zeros(size(A));
for j=1:size(A,2),
for i=1:size(A,2),
U(:,j) = U(:,j) + A(:,i)*V(i,j)/S(j,j);
end;
end;
Good luck,
-John
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|