I suppose it would make more sense to give you some results from
the latest compiler release which is named Forte Developer 6 (see
http://www.sun.com/forte). I'm not sure that discussing results from
beta compilers is that fruitful - albeit I appreciate how new
the current release is.
The first example is safe optimization, the second is most aggressive
and allows reordering.
I added a loop (bsum) which sums the array backwards just for
the fun of it. I see no reason why SUM be constrained to add
the elements in any particular order so perhaps that explains
the difference. There would be similar issues with decomposed
parallel execution.
> Do the respective vendors have any comment on why the SUM intrinsic
> should produce results different from the loop method?
I am not responding directly to this, merely out of my own
personal interest - perhaps someone from the respecitve compiler
groups may do so.
Harvey
------------------------------------------------------------------------
becksy-5.8/64:f90# f90 -xarch=v8plusa -xchip=ultra2 -O3 -o arrays arrays.f90
becksy-5.8/64:f90# ./arrays && cat fort.1
program started ...
... program ended.
f77 loop, contiguous array, sum =32761.333984 time = 0.446250
f77 loop, contiguous array, bsum =32761.164062 time = 0.445233
f77 loop, stride 2x2 array, sum =32761.333984 time = 0.467613
f90 loop, contiguous array, sum =32761.333984 time = 0.518019
f90 loop, stride 2x2 array, sum =32761.333984 time = 0.534986
f90 SUM(stride 2x2 array), sum =32761.333984 time = 0.535449
becksy-5.8/64:f90# f90 -fast -o arrays arrays.f90
f90: Warning: -xarch=v8plusa is not portable
becksy-5.8/64:f90# !./a
./arrays && cat fort.1
program started ...
... program ended.
f77 loop, contiguous array, sum =32761.222656 time = 0.205984
f77 loop, contiguous array, bsum =32761.234375 time = 0.206006
f77 loop, stride 2x2 array, sum =32761.222656 time = 0.222686
f90 loop, contiguous array, sum =32761.234375 time = 0.205678
f90 loop, stride 2x2 array, sum =32761.234375 time = 0.219132
f90 SUM(stride 2x2 array), sum =32761.234375 time = 0.222423
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|