Date: Thu, 30 Oct 1997 21:32:45 +0100
From: Swietanowski Artur <[log in to unmask]>
Just a short question:
What is in your opinion the proportion of calls to MATMUL
in which tha sequence association is preserved to all calls
in a typical code? I don't mean some examples concocted to
confuse compilers and possibly compiler writers, but real
applications.
I have no idea. I'm actually a debugger person, not a compiler
person, but I do work with our HPF group. Most of the codes I have
seen are actually legacy/F77 based, and as such use sequence
association because F77 doesn't provide otherwise. If the question is
more than rhetorical, many other people will have to include their
answers.
And three comments:
2) If a compiler came with a decent documentation of what
optimizations are possible for different input arguments, many
application writers would try to avoid the low performance
constructs. One possibility of such documentstion would be
a compiler option that would cause the compiler to list
the optimizations it cannot do and explain why.
This might be desirable from a user's perspective, but is a bad idea
for maintainability and portability. Instead of expressing the intent
of your algorithm, you would be conforming to the peculiarities of a
particular implementation. As somebody on this list said a few days
ago, you could write a wrapper to MATMUL to do your own blocking, and
all the other things you would want to the compiler to do, but the
result would no longer look like the original loop doing MATMUL()s.
3) Some performance-conscious computer vendors provide custom
versions of BLAS, which would take care of efficiently
executing the MATMUL when the input data is suficciently
regular. If you'd rely on BLAS to do the dirty work, you
could save yourself the effort of further optimizing
the 'special cases'.
You lost me here. MATMUL is an F90 intrinsic. BLAS is a linear
algebra library (which may have it's own MATMUL). If what you are
saying is that if the inputs are not sequence associated but are
regular, describe them in terms of BLAS objects, use BLAS to
redistribute/pack the inputs, then run MATMUL, I suspect the compiler
has enough knowledge of the inputs to do the redistribution/pack
without resorting to BLAS. If you are saying something else, I'm not
sure what it is.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|