> Date: Fri, 31 Oct 1997 09:59:02 -0500 > From: "David C. P. LaFrance-Linden" <[log in to unmask]> ... ... > > 3) Some performance-conscious computer vendors provide custom > versions of BLAS, which would take care of efficiently > executing the MATMUL when the input data is suficciently > regular. If you'd rely on BLAS to do the dirty work, you > could save yourself the effort of further optimizing > the 'special cases'. > > You lost me here. MATMUL is an F90 intrinsic. BLAS is a linear > algebra library (which may have it's own MATMUL). If what you are > saying is that if the inputs are not sequence associated but are > regular, describe them in terms of BLAS objects, use BLAS to > redistribute/pack the inputs, then run MATMUL, I suspect the compiler > has enough knowledge of the inputs to do the redistribution/pack > without resorting to BLAS. If you are saying something else, I'm not > sure what it is. > Just a test on a Sun system: PARAMETER (NL=1000, NC=1001) DOUBLE PRECISION A (NC, NL), B (NL, NC), C (NC, NC) C = MATMUL (A, B) 716 sec. CALL DGEMM ( 'N','N', NC, NC, NL, 1.0d0, A, NC, B, NL, 0.0d0, C, NC ) 255 sec. vanilla BLAS 53 sec. libsunperf (Sun optimized BLAS) So, apart from the lack of of performance-consciousness of Sun as a compiler vendor, what is the reason why the F90 compiler does not replace MATMUL with a call to their optimized DGEMM ? And if the compiler won't do it, then my advice is that the programmer should ! Michel Michel OLAGNON email: [log in to unmask] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%