Hi,
Coments to Callie Coats' program;
[ snipped things ... ]
>
> You will note that I blamed compiler "quality of implementation" and
> not the standard itself. But 0 for 4 on QOI is a major problem.
>
> In this particular case, the formal and actual arguments were
> dimensioned identically, by PARAMETERs--in fact, it is a cut-and-paste
> job from one place to the other--something like the following (actually,
> the dimension PARAMETERs were in INCLUDE files)
IMHO you can be sure there is no copy in/out if you just let the
internal procedure work on the data, so define TRI without
any argument list at all.
My modifications marked ! [JvO] ..
> SUBROUTINE VDIFF
> ...
> INTEGER, PARAMETER:: NLAYS = 31
> INTEGER, PARAMETER:: NVARS = 58
> REAL A( NLAYS )
> REAL B( NVARS, NLAYS )
> REAL C( NLAYS )
> REAL Y( NLAYS )
> ...
CALL TRI() ! [JvO] (A, B, C, Y )
> ...
> CONTAINS
> SUBROUTINE TRI ! [JvO]( A, B, C, Y )
> ! [JvO] INTEGER, PARAMETER:: NLAYS = 31
> ! [JvO] INTEGER, PARAMETER:: NVARS = 58
> ! [JvO] REAL, INTENT( IN ):: A( NLAYS )
> ! [JvO] REAL, INTENT( IN ):: B( NVARS, NLAYS )
> ! [JvO] REAL, INTENT( IN ):: C( NLAYS )
> ! [JvO] REAL, INTENT( INOUT ):: Y( NLAYS )
> ...
> END SUBROUTINE TRI
> END SUBROUTINE VDIFF
Or do I miss something ??
> For the F77 version, SUBROUTINE TRI was separately-compiled,
> stand-alone, and of course without INTENT clauses. And the call
> was implemented by pass-by-reference in that case.
> This is a case for which the compiler *really* *ought* to be able to
> recognize that copy-in/copy-out is NOT necessary -- but the evidence
> (and the machine code) indicates that none of these compilers did.
> In fact (SUBROUTINE TRI being rather short), I really had expected the
> compilers to implement this version by in-lining, achieving slightly
> *better* performance than the F77 version did. (Manual inlining later
> demonstrated a 5-10% speedup over the range of platforms involved...)
> And, as Nick MacLaren has been pointing out this past week over on
> "comp.arch", it was clear twenty years ago that computational cost
> would shortly (relative to the Eighties) be dominated by memory
> access time. It behooves the Fortran compiler writer to recognize
> possibilities for pass-by-reference (and for inlining), so as to
> minimize the use of copy-in/copy-out impleemntations of subroutine
> calls.
> And there is a facility I have _pleaded_ with F90 compiler writers
> to provide me -- properly annotated listings, indicating the call
> mechanism used for each argument. So far, I haven't received any
> positive response.
> Many of you will recall the output of Cray listings, with loopmarks
> indicating vectorization (and vectorizatin-type), parallelization,
> etc. I would like to extend this idea further, to provide call
> mechanism annotation so that I cans ee when the compiler is performing
> unnecessary memory traffic. I *don't* want to have to deal with PRAGMAs
> that force the compiler to use particular mechanisms when I tell it to
> do so; if I had to do that, I might as well be writing C ;-(
FWIW.
/---
Jan van Oosterwijk
Delft University of Technology.
|