From: SMTP%"[log in to unmask]" 18-MAY-2000 14:50:53.28
To: Fortran <[log in to unmask]>, Lahey Fortran <[log in to unmask]>
CC: Aleksandar Donev <[log in to unmask]>
Subj: Cache optimization in stencil-like operations
Hi Fortran experts,
I have a question for those versed in compiler cache optimizations. I
have a specific matrix product to compute that looks like a stencil
operation in PDE solvers. I have at least two ways of doing it, and I am
wondering which is faster in terms of Fortran compiler optimization in
memory access and usage. I am using Lahey Fortran 95.
Assume we have arrays all of size L*L, called H, V and R. The last one
is the result of this operation I need to perform, the rest are inputs.
Let Tx and Ty be a similar temporary arrays.
I. First option:
This option computes R in two steps, both of which are stencil-like
operations:
1. Compute temporary arrays in something like (NOT strict Fortran):
FORALL i and j:
Tx(i,j) = H(i,j)*[ V(i,j) - V(i,j+1) ]
Ty(i,j) = H(i,j)*[ V(i+1,j) - V(i,j) ]
END FORALL
2. Now compute the result R in a stencil-like operation using Tx and
Ty (NOTE: We could put Tx and Ty into a single array, say of size 2N*N
if this can help the compiler set the stride constants right):
FORALL i and j:
R(i,j) = Tx(i,j-1) - Tx(i,j) + Ty(i,j) - Ty(i-1,j)
END FORALL
This approach has the disadvantage of needing the temporary arrays, but
these can of course be eliminated by merging the two FORALL's (is this
better for speed?):
FORALL i and j:
R(i,j) = H(i,j)*[ V(i+1,j) + V(i,j+1) - 2*V(i,j) ] + H(i,j-1)*[
V(i,j-1) - V(i,j) ] - H(i-1,j)*[ V(i,j) - V(i-1,j) ]
END FORALL
I did not do this above to clarify the process better (it is clear,
right?) and also, I will need two or more temprorary arrays or more
anyway in the rest of the algorithm, so that is not a problem (also, it
is better for ease of use in the rest of the algorithm to have the two
FORALL's separate). To me it seems like something compilers should be
able to optimize well?
II. Second Option
This approach merges the two FORALL's automatically and uses a slightly
different kind of memeory acces. It is the preferred choice in an
irregular case (when not all i and j are allowed and certain terms above
appear or dissappear irregularly), but in the regular case I am not
sure:
FORALL i and j:
! This is like Tx above:
temp = H(i,j)*[ V(i,j) - V(i,j+1) ]
R(i,j) = R(i,j) - temp
R(i+1,j) = R(i+1,j) + temp
! This is like Ty above:
temp = H(i,j)*[ V(i+1,j) - V(i,j) ]
R(i,j) = R(i,j) - temp
R(i,j+1) = R(i,j+1) + temp ! SOME signs MAY be wrong, I am too
tired to think carefully
END FORALL
In a sense, the two approaches compute the the first FORALL in the same
way, but approach the second differently (in terms of the specific
network optimization algorihtm I am working on, one corresponds to
traversing the network via edges, the other via the nodes).
So, which approach is preffered for full matrices H, V and R? They seem
almost identical to me!
Thanks a lot,
Aleksandar
--
_____________________________________________
Aleksandar Donev
[log in to unmask]
Physics Department
Michigan State University
East Lansing, MI 48825
(517) 432-6770
_____________________________________________
================== RFC 822 Headers ==================
Return-Path: [log in to unmask]
Received: from thales (129.125.6.224)
by gladia.astro.rug.nl (V5.0A-1, OpenVMS V7.2-1 Alpha);
Thu, 18 May 2000 14:50:52 +0100
Received: from curlew.cs.man.ac.uk (curlew.cs.man.ac.uk [130.88.13.7])
by thales.astro.rug.nl (8.9.3/8.9.3) with ESMTP id OAA00440
for <[log in to unmask]>; Thu, 18 May 2000 14:48:12 +0200 (METDST)
Received: from [128.240.226.11] (helo=mailout1.mailbase.ac.uk)
by curlew.cs.man.ac.uk with esmtp (Exim 1.92 #3)
for [log in to unmask]
id 12sPhW-000LGu-00; Thu, 18 May 2000 13:46:54 +0100
Received: from naga.mailbase.ac.uk (naga.mailbase.ac.uk [128.240.226.3])
by mailout1.mailbase.ac.uk (8.9.1a/8.9.1) with ESMTP id NAA18396;
Thu, 18 May 2000 13:46:48 +0100 (BST)
Received: (from daemon@localhost)
by naga.mailbase.ac.uk (8.8.x/Mailbase) id NAA20025;
Thu, 18 May 2000 13:42:51 +0100 (BST)
Received: from callisto.mt.net.mk (phys-callisto.mt.net.mk [195.26.152.1])
by naga.mailbase.ac.uk (8.8.x/Mailbase) with ESMTP id NAA20007;
Thu, 18 May 2000 13:42:46 +0100 (BST)
Received: from pa.msu.edu ([195.26.153.175]) by
callisto.mt.net.mk (Netscape Messaging Server 4.1) with SMTP id
FUR9YE00.40Z; Thu, 18 May 2000 14:42:14 +0200
Message-ID: <[log in to unmask]>
Date: Sat, 20 May 2000 05:46:28 +0300
Reply-To: [log in to unmask]
Organization: Physics Department, MSU
X-Mailer: Mozilla 4.03 [en] (Win95; I)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: Cache optimization in stencil-like operations
From: Aleksandar Donev <[log in to unmask]>
To: Fortran <[log in to unmask]>,
Lahey Fortran <[log in to unmask]>
Cc: Aleksandar Donev <[log in to unmask]>
X-List: [log in to unmask]
X-Unsub: To leave, send text 'leave comp-fortran-90' to [log in to unmask]
X-List-Unsubscribe: <mailto:[log in to unmask]>
Sender: [log in to unmask]
Errors-To: [log in to unmask]
Precedence: list
Content-Transfer-Encoding: 7bit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|