2009/8/7 RAUL H C LOPES <[log in to unmask]>:
> John Bland wrote:
>>
>> Liverpool has only a small number of multi core nodes atm so we've gone
>> about things a little more accurately. We have restricted atlas pilot jobs
>> to only run on our multi core nodes, beginning with 1 job per node (maui
>> rule for each individual node).
>>
>> We've then increased by one job per node more every 4 hours or so (to get
>> some good statistics and less overlap between steps). Nothing else is
>> allowed to run on the nodes at the same time.
>>
>> This gives the efficiency/throughput purely for the hammer cloud jobs as
>> shown in the plot below. Nodes are pretty standard Xeon 8core 16GB.
>>
>> http://hep.ph.liv.ac.uk/~jbland/untuned-atlas-throughput.png
>>
>> This is for this week's run. Network bandwidth was always well below
>> saturation. Throughput decreases around 4 jobs/node but the rate of decrease
>> gets worse from 6 jobs/node.
>>
>> We're setting up some tuned nodes which should hopefully be ready for next
>> week's run to compare.
>>
>> John
>>
>> Duncan Rand wrote:
>>>
>>> Results for RHUL here:
>>>
>>> http://www.pp.rhul.ac.uk/~dtrand/test-548-thrpt.png
>>> http://www.pp.rhul.ac.uk/~dtrand/test-548-eff.png
>>>
>>> throughput levels off at about 200 running jobs - an average of 4 jobs
>>> per worker node disk.
>>>
>>> Duncan
>>>
>>> Alastair Dewhurst wrote:
>>>>
>>>> Hi all
>>>>
>>>> The Hammer Cloud test that started today (4/8/09) had to be stopped
>>>> because there was an issue with the test definition. A dataset name had
>>>> been used that no longer existed. This was responsible for the majority of
>>>> the errors that were seen. A new test has been scheduled to start tomorrow
>>>> (5/8/09) at 10 am UK time. Details can be found:
>>>> http://gangarobot.cern.ch/hc/548/test/
>>>>
>>>>
>>>> Alastair
>>
>>
> interesting the decrease in efficiency even when there were cores
> available.
>
Well, we knew this was the case anyway - Glasgow's initial data
suggested something around the 3+ cores area.
Remember, it's i/o contention in against the (single) hard disk in the
WN which is the issue here - it looks like the large seeks used by
each job within its data file interact very badly, resulting in the
disk spending most of its time seeking (and the process spending most
of its time waiting). Ewan had some observations at Oxford that backed
this interpretation up, I believe.
Sam
>>> Throughput decreases around 4 jobs/node but the rate of decrease gets
>>> worse from 6 jobs/node.
>
> That's very interesting ... stats for time waiting in I/O might help. in the
> meantime... I had noticed before 8 core machines running parallel programs
> that would show in linux a speedup that was much worse than the one obtained
> in either Freebsd or MacOS. indeed, the linux speedup would only improve
> with up to 4 cores. btw, the tests were performed using haskell, and that
> means GC involved ...
>
> raul
>
|