Daniel Grimwood wrote:
> Hi Uwe,
>
> On Wed, 13 Nov 2002, Martens Uwe PG I wrote:
>
>
>>1.) F90 code on Linux (Absoft Compiler,Debug mode) is 25 times slower
>>than F77 code whereas on Windows NT (Visual Fortran 6.1, Debug mode) F90
>>code is only 10 times slower than F77 code.
>
>
> As others have said, I don't think you should compare under debug mode -
> there could be all sorts of different checks going on. A factor of 25 is
> a bit hard to believe without seeing the code.
>
> I've attached some code to illustrate how a few compilers handle different
> f90 array syntax. Here are some timings (Linux ones are on the same
> machine). All compiled with -O. See the attached source for what the
> routines do_it1,...,do_it4 are.
>
> Times are in seconds, last column is estimated error, also in seconds.
>
> routine name : do_it1 do_it2 do_it3 do_it4 +-
> Intel 7.0beta/Linux : 32 33 33 33 2
> Lahey 6.0c/Linux : 31.8 31.2 116.0 114.6 0.5
> PGI 4.0-2/Linux : 312.8 313.3 301.8 303.8 10
> Compaq X5.4A/Tru64 : 96.5 92.5 94.0 93.7 0.1
> IBM ?/AIX : 98.2 97.2 142.8 115.2 0.03
>
> - On the IBM, it appears to make a difference whether you use x(:) vs x.
> I have no idea what version of the compiler this is.
For preference, my colleague and myself like to use x(:) when x is of a
declared size, rather than x =.
On VMS Alpha we have not seen any difference in code size or performance.
> - On Lahey, using an index to loop over arrays vs using full arrays
> operations makes a factor of 4 difference.
> - Intel, Compaq and PGI seem pretty smart with this basic array stuff,
> not much difference between the different syntax. PGI is slow in all
> cases.
>
On our system, we still in general notice that iterative loops still
give better performance than vector operations. Looking at the
optimizing annotation in a listing gives the same amount of unrolling
for each.
I picked up some code from c.l.f written by James van Buskirk where he
compares times of loops versus MATMUL for different types. Complex is
horrendously slower when done by MATMUL.
I have written some code that does matrix multiplication using the three
different techniques. What in the "old days" was considered the worst
compiles and runs the best on a heavily optimised compile -- the
compiler must recognise this basic code as a MATMUL and optimise it
regardless.
On VMS Alpha (and possibly CVF since it has the same authors), MATMUL
also (though it is in-lined code) seems to be optimised to the requested
optimisation level and then in-lined. Why, since it is an intrinsic, do
not DEC/Compaq/Hp/Intel in-line the most heavily optimised code. In the
debugger, it is a single step even though in-lined.
Regards, Paddy
***********************************************************************
"This electronic message and any attachments may contain privileged
and confidential information intended only for the use of the
addressees named above. If you are not the intended recipient of
this email, please delete the message and any attachment and advise
the sender. You are hereby notified that any use, dissemination,
distribution, reproduction of this email is prohibited.
If you have received the email in error, please notify TransGrid
immediately. Any views expressed in this email are those of the
individual sender except where the sender expressly and with
authority states them to be the views of TransGrid. TransGrid uses
virus scanning software but excludes any liability for viruses
contained in any attachment.
Please note the email address for TransGrid personnel is now
[log in to unmask]"
***********************************************************************
|