Douglas Sondak wrote: > > A few years ago we performed timing tests on f90 intrinsics. We wrote > f77-style do loops to perform the same functionality as the intrinsics > and compared timings. These results were not obtained using the > current compiler (we used SGI 7.2), but they address some of the > questions in the original posting in this thread. > > Most timings were about the same for the intrinsics and the do loops > with a few significant exceptions. Here some selected results that > showed large differences: > > Function Intrinsic (secs) Do Loops (secs) Ratio > all 11.28 9.43 1.20 > count 9.29 11.45 0.81 > cshift 10.14 5.53 1.83 > eoshift 9.22 3.50 2.63 > pack 11.29 4.85 2.33 > reshape 11.53 4.22 2.73 > unpack 11.35 6.41 1.77 > > Some of these f90 intrinsics could certainly contribute to "f90 being > slower than f77." > > A full list, including the source code used to produce the timings, is > available at > http://scv.bu.edu/SCV/Origin2000/intrinsics/F90_serial_times_7.2.html. I looked at one of your examples (PACK) and I think it shows one of the problems with the new intrinsics. This isn't a criticism of your results; it's merely an observation of the complexity of timing stuff and on the language. You tried a simple PACK(A,M) and a simple Fortran DO loop nest to do the same thing. The problem from the vendor's side is that they need to provide PACK intrinsics for real, integer, character,... and user defined types. They need to deal with 1 to 7 dimensions and vector sections, including vector-valued-subscripts, and the optional VECTOR argument. And worry about zero sized arguments. I think I remember one vendor saying they had 49 versions of the MATMUL intrinsic. The likely aproach, especially for complicated functions like PACK or EOSHIFT, is to do the general routine and then try to optimize the special cases on an as-needed time-available basis. So, in this sense F90 has "destroyed" some of the efficiency of F77. But, as someone else (O'Brien I think) has said, the new intrinsics give greater flexibility and improve user time by covering all of the odd cases. You can make a pretty good argument that for most people programmer time is the dominant cost, not run-time (and yes, I know about weather forecasts, but most people don't do them). The tradeoff is that the vendors have to do a little (grin) more work on optimizing the easy cases. ---- New topic One thing I haven't seen discussed is the effect of array dimensions on vector syntax. In physics problems there tend to be lots of arrays, but very few actual dimension sets. Most arrays have the same shape or are sub-shapes of other arrays. But given a Subroutine like Subroutine add_em_up(a,b,c,d,e,f) real a(:,:), b(:,:), c(:,:), d(:,:), e(:,:), f(:,:) a = b+c d = e+f end a compiler won't know that. It will almost for sure translate that into 2 DO loop nests when almost for sure one nest would be good enough. Also, it's likely to have to compute an index function ("I-1 + d1*(J-1))" for each of the 6 arrays, rather than one function for all 6. This isn't much of a problem for smallish codes, it's only 3 integer adds in the inner loop and that shouldn't be a major time hit. But in bigger codes it adds to register pressure and potentially adds a couple of cycles. Dick Hendrickson Not necessarily speaking for my employer