Hello,
In my applications, I used to associate REAL actual arguments with COMPLEX dummy arguments (and vice-versa). Since this is not possible when using F90 interfaces, I often have to transfer data between REAL and COMPLEX arrays.
Hence, I am concerned about the most efficient method to do these copies. I have tested 3 of them:
a) the scopy() routine
b) the cmplx(), real(), and aimag() functions
c) the transfer() functions
(All are "legal" for the default real kind, but I'm aware that only b) is permitted for other kinds).
I performed the test for small arrays (2**10 elements), which are in the cache, and for large arrays (2**20 elements), which result in cache misses.
I tested 3 platforms: Sun (Enterprise 450, f90 Workshop 6.0), SGI (Origin 2000, f90 MipsPro 7), IBM (SP2 Power3, xlf90)
Observations:
-------------
-- For real->complex, scopy() and cmplx() result in similar timings: sometimes scopy() is a bit faster, sometimes a bit slower.
-- For complex->real, scopy() is signicantly faster than the 2 steps assignement real/aimag for large arrays on SGI, and in all cases on IBM. In others cases the two approacches have similar timings.
-- The transfer() function has dramatically low performances on SUN and SGI. On IBM it seems to be better implemented, but however still significantly slower than the other approaches.
Conclusions:
------------
I would recommend as a general rule:
For real->complex, use the intrinsic cmplx()
For complex->real, scopy() should be prefered (note however that these tests were conducted for contiguous arrays. With non-contiguous arrays, scopy() would be much slower because copy-in/copy-out would occur)
For the future: I think that an intrisic that would do in one step the real()/aimag() assignements could be useful (i.e. r(:)=re_im(c(:) instead of r(1::2)=real(c); r(2::2)=aimag(c) )
Any comment ?
Regards,
Pierre
--
-----------------------------------------------------------------------
Pierre Hugonnet R&D Data Processing
COMPAGNIE GENERALE DE GEOPHYSIQUE - Paris Processing Centre
1, rue Leon Migaux / 91341 MASSY cedex / FRANCE
phone:(33) 164 47 45 59 fax:(33) 164 47 32 49
email:phugonnet(at)cgg.com (replace (at) by @)
-----------------------------------------------------------------------
The opinions expressed in this message are not necessarily
those of CGG or of Jacques Chirac
-----------------------------------------------------------------------
--------------------------------------------------------
*******
* SUN *
*******
(f90 -fast)
hugonnet% a.out
10
r->c scopy() : 1580000
r->c cmplx() : 1270000
r->c transfer(): 27700000
c->r scopy() : 1520000
c->r real/imag : 1270000
c->r transfer(): 17730000
hugonnet% a.out
20
r->c scopy() : 5050000
r->c cmplx() : 5500000
r->c transfer(): 30950000
c->r scopy() : 5440000
c->r real/imag : 6020000
c->r transfer(): 21010000
*******
* SGI *
*******
(f90 -O3)
hugonnet% a.out
10
r->c scopy() : 1020000
r->c cmplx() : 950000
r->c transfer(): 7860000
c->r scopy() : 1020000
c->r real/imag : 970000
c->r transfer(): 8320000
hugonnet% a.out
20
r->c scopy() : 1560000
r->c cmplx() : 1650000
r->c transfer(): 11400000
c->r scopy() : 1530000
c->r real/imag : 2900000
c->r transfer(): 11360000
*******
* IBM *
*******
(xlf90 -O3)
hugonnet% a.out
10
r->c scopy() : 1390000
r->c cmplx() : 1870000
r->c transfer(): 3110000
c->r scopy() : 960000
c->r real/imag : 2050000
c->r transfer(): 3170000
hugonnet% a.out
20
r->c scopy() : 7550000
r->c cmplx() : 6240000
r->c transfer(): 9950000
c->r scopy() : 5700000
c->r real/imag : 8490000
c->r transfer(): 10240000
program r2c2r
integer, parameter :: m=2**27
integer :: n, t
real, allocatable :: r(:)
complex, allocatable :: c(:)
read(*,*) n
n = 2**n
allocate( r(n), c(n/2) )
r(:) = 1.0
open(10,file='/dev/null',form='unformatted')
t=iclock()
do i=1,m/n
r(mod(i,n)+1) = 2.0
call scopy(n,r,1,c,1)
write(10)c(mod(i,n/2)+1)
end do
write(*,*) "r->c scopy() :",iclock()-t; t=iclock()
do i=1,m/n
r(mod(i,n)+1) = 2.0
c(:) = cmplx(r(1::2),r(2::2))
write(10)c(mod(i,n/2)+1)
end do
write(*,*) "r->c cmplx() :",iclock()-t; t=iclock()
do i=1,m/n
r(mod(i,n)+1) = 2.0
c(:) = transfer(r,c)
write(10)c(mod(i,n/2)+1)
end do
write(*,*) "r->c transfer():",iclock()-t; t=iclock()
do i=1,m/n
c(mod(i,n/2)+1) = (1.0,0.0)
call scopy(n,c,1,r,1)
write(10)r(mod(i,n)+1)
end do
write(*,*) "c->r scopy() :",iclock()-t; t=iclock()
do i=1,m/n
c(mod(i,n/2)+1) = (1.0,0.0)
r(1::2) = real(c(:))
r(2::2) = aimag(c(:))
write(10)r(mod(i,n)+1)
end do
write(*,*) "c->r real/imag :",iclock()-t; t=iclock()
do i=1,m/n
c(mod(i,n/2)+1) = (1.0,0.0)
r(:) = transfer(c,r)
write(10)r(mod(i,n)+1)
end do
write(*,*) "c->r transfer():",iclock()-t; t=iclock()
close(10)
end program
|