Print

Print


Wow; Visual Studio as a mail viewer!  I would suggest that you grab the
usual source code and go debug.  Where I have seen bugs, ghenry has been
faithful to the netlib source code, from which the bugs are inherited.
Yes, I have had failures with IxAMAX() as well, which I have not yet
attempted to track down.  Straightforward code such as PSR f90 generates
for MAXLOC(), compiled with conditional move instructions, does quite
well.

Here are the BLAS bugs which I have tracked down:
xGEMV fail in the case where the inner loop count is zero while the
outer loop count is non-zero.  Such a case occurs in the infamous Dave
Frank benchmark.
xDOT fail (at least certain common versions do) when the strides are of
opposite sign.  Such a case occurs in Livermore Fortran Kernel 6.
xGEMM take an inaccurate and slow (for most current architectures)
approach for the common case where only one operand is stride 1. This
problem is seen in LFK 21. Of course, the optimized binaries overcome
the slowness here.

Tim Prince
----- Original Message -----
From: "Aleksandar Donev" <[log in to unmask]>
To: "Comp Fortran" <[log in to unmask]>; "Lahey Fortran"
<[log in to unmask]>
Cc: <[log in to unmask]>; "Aleksandar Donev" <[log in to unmask]>
Sent: Friday, August 04, 2000 10:25 PM
Subject: [LF] Pentium BLAS crashes


> Hi all,
>
> I am having some trouble with some very simple level 1 BLAS routines
> optimized for Pentium II:
> http://www.cs.utk.edu/~ghenry/distrib/archive.htm#blas
>
> See for example the attached code, which crashes under Linux RedHat
6.2
> with lf95 on a single CPU Pentium II.
> I tried other routines, like SCOPY and XASAM, and they seemed to work.
> Others, like ISAMAX did not.
>
> I have used the level 2 and 3 routines before with no problems, so I
am
> assuming the level 1 must work. Am I doing something wrong? Can
somebody
> please try the code with your BLAS libraries.
>
> I need S(D)AXPY and S(D)DOT for my conjugate-gradient routines, and
they
> are really significantly faster when optimized in assembler.
>
> I am aware of only one other Pentium BLAS 1,
> http://cip.physik.uni-wuerzburg.de/~mlkessle/blas1.html, which is an
old
> page and it says AXPY is not optimized yet.
>
> Thanks a lot,
> Aleksandar
>
> --
> _____________________________________________
> Aleksandar Donev
> Physics Department
> Michigan State University
> East Lansing, MI 48824-1116
> E-mail: [log in to unmask]
> Work phone: (517) 432-6770
> _____________________________________________
>
>


------------------------------------------------------------------------
--------


> program test
> implicit none
> external :: SDOT, SCOPY, SNRM2, SASUM, ISAMAX, SAXPY
> real :: SASUM, SDOT, SNRM2, dots
> real, allocatable, dimension(:) :: vec1,vec2
> integer :: N,indx,ISAMAX
> write(*,*) "N?="
> read(*,*) N
> allocate(vec1(N),vec2(N))
> call random_number(vec1)
> call random_number(vec2)
> write(*,*) vec1, vec2
> dots=SDOT(N,vec1,1,vec2,1)
> call SAXPY(N,1.0,vec1,1,vec2,1)
> write(*,*) vec1, vec2 , dots
> end program test
> !



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%