Here are some results from an AlphaServer 4100 5/400 (dual
processor, 2GB memory, about 4 years old) running VMS 7.2-1,
compiled with Compaq Fortran V7.3-952-44A17. This is a multiuser
system, but lightly loaded, so I got essentially 100% of one cpu
during each of these runs.
I tried various optimization levels (without "exploring" all the
optimization switches' "phase-space") and found only small various
between optimization levels 3, 4 (the default) and 5, with a few of
the results improving under one optimization while others worsened.
Repeatability tests showed that all the results were more or less
consistent with each other, i.e., there are only minor differences
between runs under different optimization levels. (Longer runs
would be needed to give significance to the differences.)
For the default optimization level, I get the following results:
77 loop, contiguous array, sum =32850.292969 time = 0.680000
77 loop, stride 2x2 array, sum =32850.292969 time = 1.300003
90 loop, contiguous array, sum =32850.292969 time = 1.229996
90 loop, stride 2x2 array, sum =32850.292969 time = 1.959999
90 SUM(stride 2x2 array), sum =32850.292969 time = 1.530006
The main points to notice are that (1) the "77 loop, contiguous" is
close to twice as fast as the other variations, but (2) there is
only a smallish penalty for using F90 features. This was true for
all the optimization levels (except for _no_ optimization).
The main thing I was interested to see is that there was not
nearly the spread in results that you showed for DVF 6.1A, which I
was happy to see, I might add. :-)
-Ken
--
Kenneth H. Fairfield | Internet: [log in to unmask]
SLAC, 2575 Sand Hill Rd, MS 46 | Voice: 650-926-2924
Menlo Park, CA 94025 | FAX: 650-926-3515
-------------------------------------------------------------------------
These opinions are mine, not SLAC's, Stanford's, nor the DOE's...
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|