Yo,
AFAIK we publish per-job the real used wall and cpu times, the real used
wall and cpu power (not completely sure about whether both are sent),
and the number of cores used. using the correct combination of these
will yield you the actual HS06 value of the node used.
I think the limitation most sites run up against is using the APEL
parser to send the data. We have our own script that knows what the cpu
power is for each WN, this populates a local job database, and
periodically another script pushes recent records on to the EGI
accounting database.
The schema makes it possible to Do The Right Thing, if desired.
JT
On 3 Feb 2020, at 14:11, Stephen Jones wrote:
> Hi David,
>
> On 03/02/2020 12:07, David Rebatto wrote:
>> How do you guarantee data consistency if every site performs its own
>> scaling, possibly with home made recipes?
>
> I don't think you can guarantee data consistency. Sites could either
> follow the standard that has emerged (please see links in last
> message) or "something else". AFAIK there is no guarantee in place
> today, other than to stick to the "standard gauge".
>
>>
>> Storing raw data (and performing any manipulation in the accounting
>> system itself) allows you to compare different sites, or different
>> time periods at the same site (as in Jeff's example) with more
>> confidence.
>
>
> But the raw data, in the system we use, would not contain any
> information about the actual relative power of a slot on the node on
> which the job ran. Hence you could not accurately calculate the work
> done from the raw data unless extra data about node power is
> supplied (which it isn't). Aside: you could change the system, for
> example you could have the node tell the batch system about its own
> power, and let the batch system tell the CE about its own power and
> have the CE create job accounting records holding individual node
> power values. That would work, and there may be other spins on that.
> But the current architecture requires the use of scaled times. It's
> actually a rough kludge to make a heterogeneous cluster appear to be a
> homogeneous cluster; but it happens to give correct values. so that
> works in its favour.
>
>> Moreover you can come up with better scaling recipes, or even whole
>> new statistics, as an afterthought, and still be able to apply them
>> to the historical data.
>
> Yes. I'm just telling it like it is, not how it could or should be.
>
>
>> I don't see the connection with the scaling issue. Isn't the wall
>> clock time vs. CPU time just a measure of jobs' efficiency (i.e. how
>> much time do they "waste", sitting idle in a job slot)? How would you
>> compute that with only one of the two values?
>
> The connection with scaling comes from what Jeff mentioned. Wallclock
> time is ordinarily understood to be the time you would get with a stop
> watch if you actually stood there and timed the job. But the scheme
> used to make a heterogeneous cluster appear to be a homogeneous
> cluster requires that the times are scaled, and hence are not the time
> you would get with a stop watch; which is confusing to say the
> least. With respect to the other point: efficiency: if both times
> are scaled by the same factor, you can use the same equation to yield
> efficiency, i.e. efficiency = cputime/wallclocktime.
>
> Cheers,
>
> Ste
>
>
> --
> Steve Jones [log in to unmask]
> Grid System Administrator office: 220
> High Energy Physics Division tel (int): 43396
> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
> University of Liverpool
> http://www.liv.ac.uk/physics/hep/
>
> ########################################################################
>
> To unsubscribe from the LCG-ROLLOUT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=LCG-ROLLOUT&A=1
########################################################################
To unsubscribe from the LCG-ROLLOUT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=LCG-ROLLOUT&A=1
|