Hi,
Thanks to those that replied to my previous dilemma about memory
optimization. I proposed two different FORALL loops to do a particular
network stencil-like operation. As Carlie Coats's careful analysis
showed, the two variants were similar (we also forgot, as Bill Long
suggested, that the second variant needed an initilaization loop for the
result R, this changes the operation count somewhat). He also pointed
out, and I agree, that the first variant, given below, is more common in
Fortran and compilers know how to deal with it better.
So let's focus on it for a moment and try to answer a few questions that
I have:
FORALL(i=1:L,j=1:L)
R(i,j) = Hy(i,j)*[ V(i,j) - V(i+1,j) ] + Hx(i,j)*[ V(i,j) - V(i,j+1)
] + Hx(i,j-1)*[V(i,j) - V(i,j-1) ] + Hy(i-1,j)*[ V(i,j) - V(i-1,j) ]
END FORALL
where R, V, Hx and Hy are square arrays of size L (NOTE: the boundaries
will be handled separately above, using shadows in HPF and such. Also,
in my previous posting I had a mistake and lumped both Hx and Hy into
one array H)
Now my question:
1. Does it matter how I order the elements above to the compiler (the
switch for reordering is ON by default in Lahey Fortran 95). I can group
the elements in many different ways. Is one preferred? Why? I do want to
port the above application to many compilers and platforms, so general
principles are more important than peculiarities. Pentium and alpha
processors will be used most frequently.
2. It might help (and also simplify my argument passing) to merge Hx and
Hy into a single array of size 2L*L or L*2L. This can be done in a block
or cyclic manner. Which, if any, is better?
3. Also, if the above FORALL is replaced by two DO loops, which ordering
of i and j is better, if any? Probably j then i to get a unit stride in
the access on R.
Thanks a lot,
Aleksandar
--
_____________________________________________
Aleksandar Donev
[log in to unmask]
Physics Department
Michigan State University
East Lansing, MI 48825
(517) 432-6770
_____________________________________________
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|