On 09/03/12 16:58, Rob Fay wrote:
> On 09/03/2012 16:44, Alessandra Forti wrote:
>> On 09/03/2012 16:28, Rob Fay wrote:
>>> That's an option, but I don't think we'd really want to do that on
>>> our nodes.
>>> Aside from the increased contention for I/O, our benchmarking shows
>>> total HS06
>>> not only virtually flatlines but actually goes down as you reach full
>>> hyperthreaded capacity on these boxes - the total HS06 for 24 runs is
>>> less
>>> than the total HS06 for 18 runs.
>>>
>> on our Viglens x5650 we've never tried with 18 jobs. We tried HT
>> on/off with 12
>> jobs and the result practically identical. With 24 jobs the total is
>> 211.2 and
>> with 12 is 164.4 that's a 28.5% more. We'll try with 18 too this time.
>> If what
>> you say is true we might follow suit.
>>
> I ran it for all values between 1 and 24. The decrease in total HS06 is
> only going from 23 to 24 runs, but the increase is marginal from 18 runs
> upwards,
That's interesting - perhaps you could you publish results. Hepix in
Berkley had a couple of benchmarking presentations. The following
presentation on 5520 chips (which were current at the time) is
particularly comprehensive:
http://indico.cern.ch/materialDisplay.py?contribId=34&sessionId=9&materialId=slides&confId=61917
The figures presented there do imply that performance increases up to 24
threads - but that there's an interaction between hyperthreading and
turbo boost.
I did believe I had higher hepspec scores from nodes with better cooling
(though possibly not statistically significant). I therefore put our
C6100s at the bottom of the rack and storage nodes at the top.
> and possibly not worth it if the increased contention produces
> a drop in efficiency elsewhere (e.g. bottlenecking I/O).
>
> I take Sam's point that these are HS06 figures, not real-life
> performance, but assuming HS06 isn't completely inadequate as a
> benchmark for these porpoises, it would suggest that even if I/O isn't a
> factor leaving at least one hyperthread free may be more efficient than
> completely filling the system, and if I/O is a factor something between
> 12 and 18 runs may be optimal. YMMV.
At QMUL, we have a separate queue for ATLAS analysis jobs - which enables us
to have some control over this.
>
> If there was an easy way to measure actual job throughput with varying
> numbers of hyperthreads in use, that could be an interesting comparison.
Chris
|