Sorry I haven't been able to respond sooner, but the server in our department
went down yesterday.
In response to some of Alistairs comments:
> I tried to rewrite your program using a generic function, and found it
difficult to understand your intent.
The reason for the rather obscure piece of code in the first place comes from a
part of my implementation of the finite element method: I have lots of nodes,
that are arranged on a number of directors (a vector of nodes), and each
director has a different number of nodes on it: Like an abaccus with a different
number of beads on each wire. I have two functions (f1 and f2) that both take
the position of the final node on a director, and the number of nodes, and
outputs a vector corresponding to a function evaluated on each node on that
director. I also have a logical vector of the size of the number of directors
that tells me which function to use in each case, L. Finally I have a vector of
the size of the number of nodes, V. I want to place all the function values from
each director, sequentially in V, in parallel. This would be trivial if the
number of nodes on each director was equal.
So, if Nd is the number of directors, and Nn(i) is the number of nodes on the
i'th director, I wanted to do...
DO i=1,Nd
IF (L(i)) THEN
V(a(i):b(i))=f1(final_position(i),Nn(i))
ELSE
V(a(i):b(i))=f2(final_position(i),Nn(i))
END IF
END DO
where a(i) and b(i) are vectors containing the start and finish node number of
on each director. In order to make this parallelisable, I wanted to use a vector
of
pointer_vectors, each having a component pointing to a different array section
(director) a(i):b(i) of the target vector, V. I changed the functions to
pointer_vector functions, PV1, PV2, and altered the default pointer assignment
of pointer components of derived types to normal assignment so that I could do..
FORALL (i=1:Nd)
WHERE (L(i))
PV(i)=PV1(final_position(i),Nn(i))
ELSEWHERE
PV(i)=PV2(final_position(i),Nn(i))
END WHERE
END FORALL
The only slight complication, is that my directors are grouped in twos, and that
I have a logical array
LOGICAL, DIMENSION(2,Nd/2) :: logical_array
instead of L. Thinking about it however, I can obtain L with a suitable PACK
command if that simplifies things. Is that any clearer... Probably not! (:#
> PURE TYPE(pointer_vector_I4) ELEMENTAL FUNCTION &
> & elemental_pointer_fun(index) RESULT(ans)
>
> USE kind_mod, ONLY : I4
> USE pointer_mod, ONLY : pointer_vector_I4, ASSIGNMENT(=)
>
> IMPLICIT NONE
>
> INTEGER(I4), INTENT(IN) :: index
>
>
> ALLOCATE(ans%vect(2_I4))
> ans%vect=(/index,-index/)
>
> END FUNCTION elemental_pointer_fun
>
>If you call this routine as a vector (since it is pure) do you want a vector
>of vectors of rank 2?
I want a vector of the pointer_vector_I4 type, each with a pointer component
that's a vector of rank 1, size 2.
> I seriously suggest that you try to write your program using generic
routines and avoid ELEMENTAL. Of course you have to avoid WHERE also.
... and therefore also the FORALL.
> I think that if you do that you may have something which you can trust (ie it
is quite reliable and gives the same result on everything).
That's a sentiment that I agree with. I'd rather have a program runs slowly and
produces the right results, than one that runs fast (or not) and produces the
wrong ones. It does appear that all the compilers don't mind the DO-IF version.
In response to Mike's comments...
> I had the same problem as Alistair. However, I can find nothing in the
standard that prohibits this construct. It is, of course, as written, a
source of memory leaks: there is no way to deallocate the pointer component
of the function result once it has been used in the assignment that
references it.
Ahh. Of course. So that's why my FEM code started running out of virtual memory
when I left it running for long enough!
I must admit that from the example code that you gave, I don't fully understand
how the memory allocated and never deallocated in the function call is recovered
in the main program. In my case the pointer_vector on the left hand side of the
function call has a component that's already pointing to a target... (unlike
yours) and so...
I'll have a think about it and decide what it is I don't understand!
> Has anyone tried it yet with NAG?
Yes. Ian Chivers did...
nag f95 4.2 (512)
=================
f95 psuck.f90
Warning: psuck.f90, line 47: INTENT(OUT) dummy argument A1 never set
detected at P_VECT_I4_EQUALS_P_VECT_I4_SUB@<end-of-statement>
Error: psuck.f90, line 230: Incompatible local usage of symbol POINTER_VECTOR_I4
now imported with USE POINTER_MOD
detected at )@<end-of-statement>
Error: psuck.f90, line 246: Incompatible local usage of symbol I4 now imported w
ith USE KIND_MOD
detected at I4@<end-of-statement>
[f95 terminated - errors found by pass 1]
A general comment on parallelisation using f95 features:
Call me lazy if you like, but I am fairly new to parallelisation, and was hoping
that I could acheive sufficient parallelisation of my finite element code by
using f95 features, instead of having to resort to learning and implementing
OpenMP or MPI.
I tried to parallelise the test code that I sent around using the SUN
compiler... and it refused to parallelise any of it. That is, neither the DO-IF
version or the FORALL-WHERE version... but with a memory leak and the
combination of FORALLs, WHEREs and pointers... perhaps that's not surprising (:#
Even so it is my impression that the compiler writers concentrate on
parallellising f77 style code with DO loops and IFs, and expect that most will
use openMP or MPI to acheive parallelisation where this fails instead of using
FORALLS and WHEREs.
As a result, it's probably the case that they don't make make the most of the
opportunity to parallelise FORALL and WHERE's. I was previously under the
impression that FORALLs and WHEREs are designed such that they should always be
parallelisable. Perhaps I should withold judgement until I have done more tests,
but I have a feeling that the compiler message `not parallelized' is going to
have to be something I have to get used to, and that I too will have to go down
the openMP/MPI route.
Does the standard say anything about what a compiler must or mustn't actually do
in terms of parallelisation, or are the FORALL, WHERE, PURE, and ELEMENTALs just
an aid to developers and compilers alike in that they allow one to write code
that is easier to identify as parallelisable, and hence easier to actually
parallelise if the compiler writers have the time and inclination?
Paul
|