Neil Carlson said:
> Ah, "parallel"; that's the key concept I was overlooking. I was merely
[plus asking about optimisation]
As far as optimisation is concerned, FORALL is only superior to the obvious
DO loop nest if you are going to be running on massively parallel machines.
This is because of the semantics of FORALL, which imply lots of array temps
(the RHS is evaluated over the whole iteration space before assigning to
any of the LHS). As it happens, these array temps can only be removed by
almost exactly the same analysis which would have parallelised the DO loops
in the first place!
So as a quick rule of thumb, only use FORALL if you want the
parallelisation despite the array temps it implies (which is much of the
time on a massively parallel machine such as the Connection Machine et al,
but not much of the time at all on a "modern" parallel machine which only
has a smallish number of processors).
Certainly if the array temps cannot be eliminated by alias analysis, a
FORALL "loop" can run *MUCH* slower on a single processor than the DO loop
nest which it replaces.
Similarly, in a multiple-statement FORALL, the compiler needs successful
(alias) analysis to be able to fuse the loop nests. So on a single
processor (or smallish MP) replacing a carefully tuned fused and blocked DO
loop nest by a single FORALL is more likely to slow you down than to speed
you up.
Cheers,
--
...........................Malcolm Cohen, NAG Ltd., Oxford, U.K.
([log in to unmask])
|