Print

Print


Douglas Sondak wrote:
>
> A few years ago we performed timing tests on f90 intrinsics.  We wrote
> f77-style do loops to perform the same functionality as the intrinsics
> and compared timings.  These results were not obtained using the
> current compiler (we used SGI 7.2), but they address some of the
> questions in the original posting in this thread.
>
> Most timings were about the same for the intrinsics and the do loops
> with a few significant exceptions.  Here some selected results that
> showed large differences:
>
> Function  Intrinsic (secs)   Do Loops (secs)  Ratio
> all            11.28            9.43           1.20
> count           9.29           11.45           0.81
> cshift         10.14            5.53           1.83
> eoshift         9.22            3.50           2.63
> pack           11.29            4.85           2.33
> reshape        11.53            4.22           2.73
> unpack         11.35            6.41           1.77
>
> Some of these f90 intrinsics could certainly contribute to "f90 being
> slower than f77."
>
> A full list, including the source code used to produce the timings, is
> available at
> http://scv.bu.edu/SCV/Origin2000/intrinsics/F90_serial_times_7.2.html.

I looked at one of your examples (PACK) and I think it shows one of
the problems with the new intrinsics.  This isn't a criticism of your
results; it's merely an observation of the complexity of timing stuff
and on the language.

You tried a simple PACK(A,M) and a simple Fortran DO loop nest to do
the same thing.  The problem from the vendor's side is that they
need to provide PACK intrinsics for real, integer, character,...
and user defined types.  They need to deal with 1 to 7 dimensions
and vector sections, including vector-valued-subscripts, and the
optional VECTOR argument.  And worry about zero sized arguments.
I think I remember one vendor saying they had 49 versions of the
MATMUL intrinsic.

The likely aproach, especially for complicated functions like PACK
or EOSHIFT, is to do the general routine and then try to optimize
the special cases on an as-needed time-available basis.

So, in this sense F90 has "destroyed" some of the efficiency of
F77.  But, as someone else (O'Brien I think) has said, the new
intrinsics give greater flexibility and improve user time by
covering all of the odd cases.  You can make a pretty good argument
that for most people programmer time is the dominant cost, not
run-time (and yes, I know about weather forecasts, but most
people don't do them).  The tradeoff is that the vendors have
to do a little (grin) more work on optimizing the easy cases.

----
New topic

One thing I haven't seen discussed is the effect of array dimensions
on vector syntax.  In physics problems there tend to be lots of
arrays, but very few actual dimension sets.  Most arrays have the
same shape or are sub-shapes of other arrays.  But given a
Subroutine like

      Subroutine add_em_up(a,b,c,d,e,f)
      real a(:,:), b(:,:), c(:,:), d(:,:), e(:,:), f(:,:)
      a = b+c
      d = e+f
      end

a compiler won't know that.  It will almost for sure translate
that into 2 DO loop nests when almost for sure one nest would
be good enough.  Also, it's likely to have to compute an index
function ("I-1 + d1*(J-1))" for each of the 6 arrays, rather than
one function for all 6.  This isn't much of a problem for smallish
codes, it's only 3 integer adds in the inner loop and that shouldn't
be a major time hit.  But in bigger codes it adds to register
pressure and potentially adds a couple of cycles.

Dick Hendrickson
Not necessarily speaking for my employer