Hello
I thought I should - as suggested - explain clearly what my problem is.
We are working in the field of Quantum Chemistry. The principles
of Quantum Mechanics is used to study various properties. Here every
"operator" has a matrix representation and we need to get their
eigen-values and eigen-vectors. Once we get eigenvalues and eigenvectors
we can use this "vector space" to find out "expectation-value" of other
"operators". This "vector space" in principle should be infinite
dimensional, but some clever tricks, approximations and symmetry of the
operators restricts the dimensionality of vector space.
So once we decide how to form the operator (say H) and we have calculated
the matrix representation, the next step is to diagonalise it. These are
square matrices and sometime Hermitian. We also know that most of the
entries are zero. So we use CSR format to store those data. Next step is
to diagonalise and get "few" eigenvalues and eigenvectors. Problem comes
when data size is large and we need multi-processer computation. The size
of a matrix depends on a parameter called the system size "n" and it is
4^n. The value "n=8" means there will be 2^17 elements and even 20% of
entries are non-zero and unique then it makes our life complicated. On
the other hand "n=8" is too small to make any statement about the
physical system, because there are 10^23 atoms in one mole of substance.
Thus our problem factorises into following parts:
1) Get the operator in matrix form:
Which we are doing by storing in CSR format which has only non-zero
and unique element. In addition we have a table which maps the
"matrix-coordinate" to it's value.
2) We have serial code for diagonalisation where we are interested in
only one eigen-value and corresponding eigen-vector. The matrix is not
symmetric in general, so we are using Rettrup algorithm to diagonalise
this. Here we use a Diagonalisation subroutine available from
the book "Numerical Recipes" or other available at netlib.org. These
are used to diagonalise the full "projected subspace" of original
vector space.
Our guess is the following:
Suppose we are using MPI. Here we can scatter data from one processor to
the other. Thus each processor has it's own data set. We think that if we
store data to different processor and device an algorithm which can tell
us the element's value irrespective of which processor holds that value
during computations. Since we need to use diagonalisation subrotuine in
the intermediate steps, we are unable to find out suitable parallelisation
scheme to this. We are using IBM's SP2 for parallel computing.
This may not have answered all of your questions but we would be grateful
if someone can give us suggestions as to which course to take and how to
tackle the problem efectively.
Regards,
Varadharajan S
|