On 11/09/2011 07:21 PM, Pascal wrote:
> Le Tue, 8 Nov 2011 16:25:22 -0800,
> Nat Echols<[log in to unmask]> a écrit :
>
>> On Tue, Nov 8, 2011 at 4:22 PM, Francois Berenger<[log in to unmask]>
>> wrote:
>>> In the past I have been quite badly surprised by
>>> the no-acceleration I gained when using OpenMP
>>> with some of my programs... :(
>
> You need big parallel jobs and avoid synchronisations, barriers or this
> kind of things. Using data reduction is much more efficient. It's working
> very well for structure factors calculations for exemple.
>
>>
>> Amdahl's law is cruel:
>>
>> http://en.wikipedia.org/wiki/Amdahl's_law
>
> You can have much less than 5% of serial code.
>
> I have more problems with L2 misse cache events and memory bandwidth. A
> quad cores means 4 times the bandwidth necessary for a single process...
> If your code is already a bit greedy, the scale up is not good.
I never went down to this level of optimization.
Are you using valgrind to detect cache miss events?
After gprof, usually I am done with optimization.
I would prefer to change my algorithm and would be afraid
of introducing optimizations that are architecture-dependent
into my software.
Regards,
F.
|