Print

Print


LHC Computer Grid - Rollout 
> [mailto:[log in to unmask]] On Behalf Of Burke, S (Stephen)
said:
> To start with, there's the new GlueHostProcessorOtherDescription
attribute,

OK, second thing. There should now be a CECapability published as e.g.

GlueCECapability: CPUScalingReferenceSI00=1250

One thing to note is that there's a typo in the release notes, in some
places it calls it CPUScalingFactorSI00 and some sites are publishing
that. ScalingReference is correct.

  Secondly there seems to be a bit of confusion about exactly what this
means, so I'll try to explain further ... the existing SI00 attribute
(GlueHostBenchmarkSI00) is the average benchmark rating for the CPUs in
the SubCluster. However, some sites use the PBS facility to normalise
the cpu time limits for the queues to a standardised SI00 reference,
which may be quite different to the actual power of the CPUs, e.g. maybe
your reference CPU is for an SI00 of 100 whereas your actual CPUs are
much more powerful. Since we now want to measure the real installed CPU
power at sites that could underestimate the capacity at such a site by a
large factor. The new scheme splits the attribute in two. The old
BenchmarkSI00 should become the actual (average) rating, so it can be
used to measure the installed capacity. The CPUScalingReferenceSI00 is
the value used in the batch system, which may or may not be the same.
APEL will be changed to use that value, since the CPU time for jobs as
reported by PBS is also scaled to the reference CPU power.

  As a second point, if sites don't use the batch system scaling and
just apply the time limits directly to whatever CPU the job lands on,
the ScalingReference should be the *least* powerful CPU in the system.
That means that when you estimate how long the job will run the actual
run time may be faster but not slower, so the job should always fit
inside the time limit.

  As a concrete example, say I know that with an SI00 of 2000 my job
will take 10 hours. Then I should be able to estimate my CPU time on any
CE as 10*60*2000/CPUScalingReferenceSI00 and ask for a CE with a
MaxCPUTime bigger than that (plus some safety margin) and be confident
that my job won't be killed for exceeding the CPU time limit.

Stephen
-- 
Scanned by iCritical.