I found a problem in the Fortran 90 program that I have ported to a Cray
J90 which I can reproduce with small test programs. The program makes use
of OpenMP and there is a bug in Cray's implementation of Fortran 90. I
reported this to the local help desk at NERSC. They filed a report with
Cray in March. The consultant at NERSC optimistically predicted that the
bug would be fixed in *three* months. In fact Cray were still testing
their bug fix in late June and a Fortran 90 compiler with the bug fix
still hasn't been installed on the local J90.
The Cray PVPs Fortran compilers have their own proprietary autotasking
directives for shared memory programming. I think support for OpenMP was
added relatively recently. However, the Cray Fortran manual does say that
the autotasking directives are now deprecated and that OpenMP should be
used so presumably their implementation of OpenMP should now be stable?
I note in passing that Cray were recently sold by SGI to Tera. SGI kept
their Origin 2000 line going. (Great machines, by the way :)
Anyway, enough complaining about Cray. I will describe the bug and would
like to know if anybody has any suggestions of a good workaround. At
present I use "C$omp parallel do, if(nloop.ne.1)" where nloop is the final
value of the loop variable. However, this extra check may have a
performance penalty (but without it, presumably anything can happen...).
I am also interested if anybody else has had problems with OpenMP with
this compiler.
When I have code similar to below:
C$comp parallel do
do i=1,nloop
.
.
.
C carry out some action - e.g. call a subroutine
.
.
.
enddo
C
and nloop=1 with the program running on more than one processor, I find
that this crashes with a floating point error. In general, there probably
isn't any good reason to parallelise a do loop with 'do i=1,1'; but this
would mean that there has to be extra code to handle nloop=1; as a special
case. (This could end up with a lot of changes in the program.) I should
explain that in the original program the 'similar' loop is intended to
divide up work between different processors. It is probably a bug that
the 'nloop' variable can ever end up equal to one when the program is
running on more than one processor. However, it is surprising to me that
it causes the program to crash rather than just waste some CPU time.
Of related interest, I find a small test program with do i=1,1...enddo and
the OpenMP directives as above produces a compile-time error message (with
the '-O 1' level of optimisation and above). Apparently the optimisation
generates a divide by zero and this causes a small test program to fail to
compile. If the OpenMP directive isn't there then the code does compile
successfully.
William Nicholson
For those are interested here is the code for the first mentioned test
program. Run it on more than one CPU and enter 1 when prompted and
watch the program bomb. (By the way, the reason I have written the program
so that one enters a value for nloop is that at the higher optimisation
levels, if nloop is initialised to 1 in the program then the optimisation
just takes the loop out.)
program dirtest2
C
implicit none
C
integer, parameter :: NSIZE=100
real arr(NSIZE)
integer stdin, stdout, stderr
integer i, nloop
C
stdin=5
stdout=6
stderr=0
C
do i=1,NSIZE
arr(i)=0.
enddo
C
write(stdout, *)'Enter value for nloop:'
read(stdin, *) nloop
C
C$omp parallel do
do i=1, nloop
arr(i)=i
enddo
C
write(stdout, *) arr
C
stop
C
end
Here is the related program which gives a compile-time error:
program dirtest2a
C
implicit none
C
integer, parameter :: NSIZE=100
real arr(NSIZE)
integer stdin, stdout, stderr
integer i, nloop
C
stdin=5
stdout=6
stderr=0
C
do i=1,NSIZE
arr(i)=0.
enddo
C
C$omp parallel do
do i=1, 1
arr(i)=i
enddo
C
write(stdout, *) arr
C
stop
C
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|