Hope I didn't introduce any new bugs; here's SDOT source optimized for
P-II and r12k. Should be a big improvement over the original. I'll
attach it also, which should preserve indentation. No, my pretty
printer couldn't handle this one in Windows, had to reboot over to
linux. One more question: wouldn't it be good to set precision mode to
default inside these subroutines, in case the calling environment has it
set some other way?
real function sdot(n,x,incx,y,incy)
C
C RETURNS THE DOT PRODUCT OF X AND Y.
C SDOT = SUM FOR I = 0 TO N-1 OF X(LX+I*INCX) * Y(LY+I*INCY),
C WHERE LX = 1 IF INCX .GE. 0, ELSE LX = (-INCX)*N, AND LY IS
C DEFINED IN A SIMILAR WAY USING INCY.
C
integer n,incx,incy,ix,iy,i,m,mp1,ns
real x(*),y(*)
sdot= 0.0e0
if(n > 0.and.(incx.ne.incy.or.incx < 1))then
ix= 1
C
C CODE FOR UNEQUAL OR NONPOSITIVE INCREMENTS.
C
iy= 1
C attempted bug fix here; reverse both or neither, and fix
strides
if(incx < 0.and.incy < 0)then
incx= -incx
ix= (-n+1)*incx+1
incy= -incy
iy= (-n+1)*incy+1
endif
do i= 1,n
sdot= sdot+x(ix)*y(iy)
ix= ix+incx
iy= iy+incy
enddo
else
if(incx.ne.1)then
ns= n*incx
C
C CODE FOR POSITIVE EQUAL INCREMENTS .NE.1.
C
do i= 1,ns,incx
sdot= sdot+x(i)*y(i)
enddo
else
C
C CODE FOR BOTH INCREMENTS EQUAL TO 1.
C
C
C CLEAN-UP LOOP SO REMAINING VECTOR LENGTH IS A MULTIPLE OF 5.
C
C m= mod(n,5)
C do i= 1,m
C sdot= sdot+x(i)*y(i)
C enddo
C mp1= m+1
C do i= mp1,n,5
C sdot= sdot+x(i)*y(i)+x(i+1)*y(i+1)+x(i+2)*y(i+2)+x(i+3
C &)*y(i+3)+x(i+4)*y(i+4)
C Preconditioning SO REMAINING VECTOR LENGTH IS A MULTIPLE OF 2.
C
mp1= 1
if(iand(n,1)==1)then
sdot= x(1)*y(1)
mp1= 2
endif
do i= mp1,n-1,2
sdot= x(i)*y(i)+x(i+1)*y(i+1)+sdot
enddo
endif
endif
return
end
Tim Prince
----- Original Message -----
From: "Aleksandar Donev" <[log in to unmask]>
To: "Comp Fortran" <[log in to unmask]>; "Lahey Fortran"
<[log in to unmask]>
Cc: <[log in to unmask]>; "Aleksandar Donev" <[log in to unmask]>
Sent: Friday, August 04, 2000 10:25 PM
Subject: [LF] Pentium BLAS crashes
> Hi all,
>
> I am having some trouble with some very simple level 1 BLAS routines
> optimized for Pentium II:
> http://www.cs.utk.edu/~ghenry/distrib/archive.htm#blas
>
> See for example the attached code, which crashes under Linux RedHat
6.2
> with lf95 on a single CPU Pentium II.
> I tried other routines, like SCOPY and XASAM, and they seemed to work.
> Others, like ISAMAX did not.
>
> I have used the level 2 and 3 routines before with no problems, so I
am
> assuming the level 1 must work. Am I doing something wrong? Can
somebody
> please try the code with your BLAS libraries.
>
> I need S(D)AXPY and S(D)DOT for my conjugate-gradient routines, and
they
> are really significantly faster when optimized in assembler.
>
> I am aware of only one other Pentium BLAS 1,
> http://cip.physik.uni-wuerzburg.de/~mlkessle/blas1.html, which is an
old
> page and it says AXPY is not optimized yet.
>
> Thanks a lot,
> Aleksandar
>
> --
> _____________________________________________
> Aleksandar Donev
> Physics Department
> Michigan State University
> East Lansing, MI 48824-1116
> E-mail: [log in to unmask]
> Work phone: (517) 432-6770
> _____________________________________________
>
>
------------------------------------------------------------------------
--------
> program test
> implicit none
> external :: SDOT, SCOPY, SNRM2, SASUM, ISAMAX, SAXPY
> real :: SASUM, SDOT, SNRM2, dots
> real, allocatable, dimension(:) :: vec1,vec2
> integer :: N,indx,ISAMAX
> write(*,*) "N?="
> read(*,*) N
> allocate(vec1(N),vec2(N))
> call random_number(vec1)
> call random_number(vec2)
> write(*,*) vec1, vec2
> dots=SDOT(N,vec1,1,vec2,1)
> call SAXPY(N,1.0,vec1,1,vec2,1)
> write(*,*) vec1, vec2 , dots
> end program test
> !
|