> Date: Fri, 31 Oct 1997 09:59:02 -0500
> From: "David C. P. LaFrance-Linden" <[log in to unmask]>
...
...
>
> 3) Some performance-conscious computer vendors provide custom
> versions of BLAS, which would take care of efficiently
> executing the MATMUL when the input data is suficciently
> regular. If you'd rely on BLAS to do the dirty work, you
> could save yourself the effort of further optimizing
> the 'special cases'.
>
> You lost me here. MATMUL is an F90 intrinsic. BLAS is a linear
> algebra library (which may have it's own MATMUL). If what you are
> saying is that if the inputs are not sequence associated but are
> regular, describe them in terms of BLAS objects, use BLAS to
> redistribute/pack the inputs, then run MATMUL, I suspect the compiler
> has enough knowledge of the inputs to do the redistribution/pack
> without resorting to BLAS. If you are saying something else, I'm not
> sure what it is.
>
Just a test on a Sun system:
PARAMETER (NL=1000, NC=1001)
DOUBLE PRECISION A (NC, NL), B (NL, NC), C (NC, NC)
C = MATMUL (A, B)
716 sec.
CALL DGEMM ( 'N','N', NC, NC, NL, 1.0d0, A, NC, B, NL, 0.0d0, C, NC )
255 sec. vanilla BLAS
53 sec. libsunperf (Sun optimized BLAS)
So, apart from the lack of of performance-consciousness of Sun as a
compiler vendor, what is the reason why the F90 compiler does not
replace MATMUL with a call to their optimized DGEMM ?
And if the compiler won't do it, then my advice is that the programmer
should !
Michel
Michel OLAGNON email: [log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|