Print

Print


At 03:49 PM 9/16/2004, Aleksandar Donev wrote:

>Hello,
>
>Does someone have references or personal experience with scientific
>codes and the (relatively) new sse2 instructions on P4 and higher
>versus traditional FPU instructions?
>
>Tests on my codes indicate that using sse2 instead of fpu for my
>application caused a slowdown of about 50%. Obviously the wrong sign
>there. I will need more detailed study to see if this is related to the
>precision issue (my codes have a lot of precision tolerances which may
>need to be loosened up since calculations are going 80->64 bits), but I
>would appreciate any experiences, general guidelines, etc.
>
>Thanks,
>Aleksandar

If, in fact, your tolerances depend on the extra few bits of precision you
are likely to get by combining 80-bit intermediates with double precision,
you may have to loosen up tolerances, normally by only 1 significant
decimal digit.
Most of the additional performance potential of SSE2 is in the use of
parallel instructions by vectorizing compilers, and the much improved
facility for conversions between real and integer.  This vectorization
works only for stride 1 vectors, and is helped out by observing other
conditions, such as 128-bit data alignment.
Having spend much of the last 3 years on this topic, probably the majority
of what I might say would be off target.


Tim Prince