JISCMail - TB-SUPPORT Archives

Hi Steve,

On 31/10/2016 14:47, Stephen Jones wrote:
> Hi Alessandra,
>
> I have a couple of questions before I do the numbers for this.
>
>>>
>>> json file with the numbers for the month contained in the Status
>>> field the elements of the array of numbers are: HS06 on the
>>> atlas dashboard, HS06 in APEL, ratio, wallclock in ATLAS,
>>> wallclock in APEL, wallclock ratio.
>
> Would you please expand on the meaning of these fields (HS06 measures 
> Power, not Work; pls. see other (long) thread!?!)
>
> * HS06 on the atlas dashboard - I assume we mean "CPU Work" (in HS06 
> Hours) And how was this measured by ATLAS?
>
no, we measure everything in wallclock now. So in the new terminology it 
would be "wallclock work". The numbers in the SSB are obtained 
multiplying the "Delivered power" by the number of hours in the month 
for each month. The delivered power is in the atlas dashboard and is 
calculated the opposite way dividing the wallclock work measured in the 
dashboard by the number of hours in the month (I know its confusing).
> * HS06 in APEL - I assume we mean "CPU Work" (in HS06 Hours) as 
> measured at the site. Please confirm.
>
Again it is the wallclock work as measured at the site.
> * wallclock in ATLAS - Is this really just wallclock time ? Or do we 
> mean wall clock _work_ (in HS06 Hours)?
>
this is raw wallclock durations
> * wallclock in APEL - Again, surely we mean wall clock work (in HS06 
> Hours) as measured at the site?
>
in this case is also just wallclock which unfortunately maybe scaled so 
this is not a good comparison with atlas wallclock for sites that scale 
- though it is still useful at sites that don't scale and in few other 
situations.

cheers
alessandra
> Also, how do you compensate for batch system scaling of "wall clock 
> duration".  The batch system often changes the job duration by a 
> scaling factor to account for differences in hardware on heterogeneous 
> clusters. Unless both  both "wallclock in ATLAS" and "wallclock in 
> APEL" are scaled (or both not scaled) they cannot be compared to each 
> other and the ratio would be meaningless. And since (a) most clusters 
> only  expose scaled wall clock durations, and (b) ATLAS only has its 
> own raw wall clock durations, then I suspect you will need to 
> compensate for this action somehow, or the numbers will be foul.
>
> Cheers,
>
> Ste
>
>
>>> Status": "123600,55882,45,11771,4299,37"
>>>
>>> APEL HS06 value 55882 is 45% the ATLAS dashb value 123600
>>> The APEL wallclock 4299 is 37% of the ATLAS dashb value 11771
>>>
>>> Sometimes the discrepancies are the other way, i.e. APEL much bigger 
>>> than ATLAS. ATLAS gets the numbers only for the payload, while APEL 
>>> gets the numbers from the batch system including all the pilot time, 
>>> so in theory APEL should always be slightly bigger. Whatever the 
>>> numbers they should be within 15-20% difference once way or the other.
>>>
>>> I've put these pages in the links page
>>>
>>> https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages#Accounting
>>>
>>> Let me know if you have any comment. Can UCL, RHUL, ECDF and Brunel 
>>> help me understand their discrepancies please?
>>>
>>> cheers
>>> alessandra
>
>

-- 
Respect is a rational process. \\//
Fatti non foste a viver come bruti (Dante)