> -----Original Message-----
> From: Alessandra Forti [mailto:[log in to unmask]]
> Sent: Monday, June 06, 2011 9:08 AM
> To: Testbed Support for GridPP member institutes
> Cc: Bly, Martin (STFC,RAL,ESC)
> Subject: Re: Accounting continued
>
> On 06/06/2011 08:38, Martin Bly wrote:
> > Does this not just bear out that the 'average' HS06 assigned by the sites to
> their farms is poor and/or that the spread of performance per core is wide on
> a given farm?
> >
> > Martin.
> I think the second. This is an attempt to assign a single number to a
> site when it is not possible due to the etherogeneity of the hardware
> and the way sys admins configure the fair share. Manchester hepspecs are
> inline with other sites on hepix so they are not poorly assigned still
> we get the lowest of the two numbers becuase it is possible that steve
> jobs run mostly on the slowest nodes where the majority of users jobs
> are redirected. This is assuming HS06/event = const assumption is even a
> correct one.
>
> cheers
> alessandra
OK, that makes some sense.
I was on the group that came up with the HS06 benchmark. We did a fairly careful set of tests to see what standard benchmark we could use to match the real-world performance of systems when running HEP codes. At the time I think we realised that although the performance scaled linearly for a given real world app, the slopes are different for each app so that experiments would need to calibrate their expectations. However I suspect that several things lead to distortions: the prevalence of 64bit over 32bit since we did the original tests, the I/O regime in which the tests are performed, changes to the code bases, to name some. I suspect that I/O regimes will make the greatest difference to events/HS06 for two otherwise identical nodes.
It is perhaps time to revisit the benchmarks and see if our assumptions have held good, but also to see if there is a better way of letting the jobs know what the hepspec per core they are running on actually is. I think the latter may be addressed by the requirement of the Virtualisation Working Group for this information to be 'available' at run time (job instantiation time). (Got me fingers in that pie too...)
Martin.
--
Martin Bly
RAL Tier1 Fabric Manager
|