At 03:49 PM 9/16/2004, Aleksandar Donev wrote: >Hello, > >Does someone have references or personal experience with scientific >codes and the (relatively) new sse2 instructions on P4 and higher >versus traditional FPU instructions? > >Tests on my codes indicate that using sse2 instead of fpu for my >application caused a slowdown of about 50%. Obviously the wrong sign >there. I will need more detailed study to see if this is related to the >precision issue (my codes have a lot of precision tolerances which may >need to be loosened up since calculations are going 80->64 bits), but I >would appreciate any experiences, general guidelines, etc. > >Thanks, >Aleksandar If, in fact, your tolerances depend on the extra few bits of precision you are likely to get by combining 80-bit intermediates with double precision, you may have to loosen up tolerances, normally by only 1 significant decimal digit. Most of the additional performance potential of SSE2 is in the use of parallel instructions by vectorizing compilers, and the much improved facility for conversions between real and integer. This vectorization works only for stride 1 vectors, and is helped out by observing other conditions, such as 128-bit data alignment. Having spend much of the last 3 years on this topic, probably the majority of what I might say would be off target. Tim Prince