Print

Print


> Date: Fri, 31 Oct 1997 09:59:02 -0500
> From: "David C. P. LaFrance-Linden" <[log in to unmask]>
...
...
> 
>    3) Some performance-conscious computer vendors provide custom 
>       versions of BLAS, which would take care of efficiently 
>       executing the MATMUL when the input data is suficciently 
>       regular. If you'd rely on BLAS to do the dirty work, you 
>       could save yourself the effort of further optimizing 
>       the 'special cases'.
> 
> You lost me here.  MATMUL is an F90 intrinsic.  BLAS is a linear
> algebra library (which may have it's own MATMUL).  If what you are
> saying is that if the inputs are not sequence associated but are
> regular, describe them in terms of BLAS objects, use BLAS to
> redistribute/pack the inputs, then run MATMUL, I suspect the compiler
> has enough knowledge of the inputs to do the redistribution/pack
> without resorting to BLAS.  If you are saying something else, I'm not
> sure what it is.
> 

Just a test on a Sun system:

      PARAMETER (NL=1000, NC=1001)
      DOUBLE PRECISION A (NC, NL), B (NL, NC), C (NC, NC)

C = MATMUL (A, B)
      716 sec.

CALL DGEMM ( 'N','N', NC, NC, NL, 1.0d0,  A,  NC,  B, NL, 0.0d0, C, NC )
      255 sec. vanilla BLAS 
       53 sec. libsunperf (Sun optimized BLAS)

So, apart from the lack of of performance-consciousness of Sun as a
compiler vendor, what is the reason why the F90 compiler does not
replace MATMUL with a call to their optimized DGEMM ?

And if the compiler won't do it, then my advice is that the programmer
should !

Michel

Michel OLAGNON   email: [log in to unmask]


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%