Print

Print


This is a common piece of code that I use to copy a segment from 1  
arbitrary dimensional array to another array of the same  
dimensiononality (but may be different size)



   function offset(pos, lb, strd)
     !
     ! compute the offset into a flattened array
     !
     integer, dimension(:), intent(in) :: pos, lb, strd
     integer :: offset, i

     offset = 1

     do i = 1, ndims
        offset = offset + (pos(i)-lb(i))*strd(i)
     end do
   end function offset






   real(8), allocatable(:) :: array1, array2
   integer, allocatable(:) :: inc, lb1, ub1, lb2, ub2
   integer :: i, ii, imax, jj, ndims

   ndims=something

   allocate(array1(some large size), array2(another large size))
   allocate(inc(ndims), lb1(ndims), ub1(ndims))
   allocate(lb2(ndims), ub2(ndims))

   lb1=something
   ub1=something

   strd1(1)=1

   do i = 2, ndims
      strd1(i) = (ub1(i)-lb1(i)+1)*strd1(i-1)
   end do


   lb2=something
   ub2=something

   strd2(1)=1

   do i = 2, ndims
      strd2(i) = (ub2(i)-lb2(i)+1)*strd2(i-1)
   end do


   imax = offset(ub1, lb1, strd1)
   ii = imax
   inc = lb1

   do while (ii <= imax)
      i = offset(inc, lb1, strd1)
      jj = offset(inc, lb2, strd2)

      do ii = i, i+ub1(1)-lb1(1)
         array2(jj) = array1(ii)
         jj = jj + 1
      end do

      do k = 2, ndims
         inc(k) = inc(k) + 1
         if (inc(k) <= ub2(k)) exit
         inc(k) = lb1(k)
      end do
   end do


Notionally lb1/ub1 and lb2/ub2 are the upper and lower bounds of the  
two arrays.  So by adjusting the lb's and ub's in

   imax = offset(ub1, lb1, strd1)
   ii = imax
   inc = lb1

you can choose a different segment of the arrays.  The example I have  
provided copys ALL of array1 into array2 (though array2 may be of  
different size).

I use to have the function offset defined as a cpp macro so that it  
was inlined... but most compilers can inline a function just as well.

Things to note:  for speed, you need to try and keep your memory- 
stores contiguous - hence the array over ii/jj is contiguous.  If you  
stride over cache lines while doing a memory-store, you will actually  
induce a cache-read, cache-modify, cache-write.  Which is costly :)   
Much better just to do a cache-read, lost of cache-modifies, cache- 
write and fush a full/modified cache line.

I use modifications of this piece of code all through my MPI-AMR code  
so I can do 2 and 3D problems without having to recompile.  I use it  
in my fortran scripting language so I can have interpreted arrays of  
arbitrary size and dimensionality.  In all the performance testing I  
have done, this code snippet is as fast as hard wiring the number of  
dimensions.  Deep down the compiler has to be doing the same thing,  
as memory is a single dimensional address range.

Of course, doing this sort of thing screws with your mind...  
especially when you start getting into some hairy mathematics :)  It  
is much easier to let the compiler do it.

Stu.



--
Dr Stuart Midgley
Industry Uptake Program Leader
iVEC, 'The hub of advanced computing in Western Australia'
26 Dick Perry Avenue, Technology Park
Kensington WA 6151
Australia

Phone: +61 8 6436 8545
Fax: +61 8 6436 8555
Email: [log in to unmask]
WWW:  http://www.ivec.org