On 1/4/11 1:28 PM, Greenberg, Naomi wrote:
> Bill,
> I inherited some code that was optimized for a Cray vector machine. There
> originally were very large computational loops that worked over large
> arrays. The person who vectorized it (very successfully) broke up the large
It sounds like the code was already vectorized. "optimized for a Cray
vector machine" did allow vector loads from memory that were strided or
gathers (scatters for stores), and hardware vectorization of conditional
expressions, so the range of code that vectorized was a lot wider than
with an x86-64-style processor. However, if the original array
references were stride-1 (contiguous in memory), then the original code
might not be that bad for the x86-64. It is much easier to have the
compiler internally break up the loop than recoding it by hand. Even if
you have to play around with compiler flags. x86-64 compilers have
gotten a lot better at vectorization in recent years.
> loop into a series of small simple clear vectorizable loops that computed
> partials and then combined them later. He sized the partial arrays based on
> a vector size (R(nvec,2,3), with loops going in nvec chunks). My questions
Actually, I suspect he sized the temp arrays based mainly on the cache
size. If you fail to keep these temps in cache, then most of the
benefit of doing this sort of hand optimization is lost. The value of
nvec should be a multiple of the hardware vector length, but what
multiple depends on the cache sizes and the number of temps. If you
pick nvec = n*16, the value of nvec should be a multiple of the vector
length for all of the "small vector" architectures. Play around with
different values of n to get optimal performance.
> now are 1) what should nvec be set for an Itanium-based linux system or
> other 32 bit system (Intel compiler)? Is there a way to compute this
> automatically? 2) Is this the best way to do this still? Can I be hurt by it
> on other machines? Again, it's not just the loop counters that are sized,
If you try to migrate this code to a GPU accelerated system, you will
really want the original long vectors back again. The games involving
R(nvec,2,3) style optimizations are particular to the particular
architecture - it is not a general scheme where nvec is a parameter that
can be varied to encompass all architectures.
Cheers,
Bill
> it's the actual data structure sizes also.
>
> Thanks,
> Naomi
>
> -----Original Message-----
> From: Fortran 90 List [mailto:[log in to unmask]] On Behalf Of
> Bill Long
> Sent: Tuesday, January 04, 2011 11:48 AM
> To: [log in to unmask]
> Subject: Re: Finding vector size
>
> The optimal vector size for each machine should be known internally by
> the compiler. Write the loop as the algorithm dictates. The compiler
> will divide it up into vector "chunks" automatically if the body of the
> loop can be executed by vector hardware. User attempts to manually
> reform loops for presumed vector lengths, pipelining, or cache
> blocking are generally counter-productive. The result is code that is
> unclear to read, difficult to maintain, and confusing to the compiler.
> Compiler optimizers work best on simple, clean loops.
>
> Cheers,
> Bill
>
>
> On 1/4/11 9:37 AM, Greenberg, Naomi wrote:
>> I am trying to find a way to configure code before compile time to set
>> the optimal loop vectorization size for the user's machine and then
>> (using the Fortran preprocessor) get that value and set the loop size to
>> this value. For example, on Machine1, nvec might be 64, on machine2, it
>> might be 1024, and the code would "do i=1,nvec" (obviously not quite
>> that way). The question is whether there's a way to automatically get
>> the optimal vector size from each machine (using Linux) or whether
>> there's a better way to get the same result? Any suggestions are welcome!
>>
>> Naomi Greenberg
>>
>> /Member of the Research Staff/
>>
>> Riverside Research Institute
>>
>> (212) 502-1718 (ph)
>>
>> (212) 502-1729 (fax)
>>
>> [log in to unmask]
>>
>
--
Bill Long [log in to unmask]
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc./Cray Plaza, Suite 210/380 Jackson St./St. Paul, MN 55101
|