The availability for the site doesn't work like that. If 50% fails it is
accounted for.
What you say is true only in case of downtimes i.e. the downtime of one
CE doesn't affect the avaiability but if the CE is up and it fails it is
accounted as failed.
cheers
alessandra
On 22/01/2013 16:00, Christopher J. Walker wrote:
> On 22/01/13 15:36, Alessandra Forti wrote:
>> The only explanation Chris has given about how representative they are
>> is that these jobs use the WMS. My reply is: so do ops, atlas, lhcb and
>> CMS nagios tests.
> Consider
> ce01: passes 100% of jobs
> ce02: fails 100% of jobs.
>
> Ops test availability is the "OR" of CE availability. If you have one CE
> that fails every job, and one that passes, ops availability 100%, even
> though a user experiences 50% jobs failing (assuming they hit the CEs
> equally). Steve's test represents that experience.
>
> And yes, QMUL is now using the Imperial BDII (and causing it some
> stress, I'm afraid), to avoid the BDII issues mentioned below.
>
> Chris
>
>> cheers
>> alessandra
>>
>> On 22/01/2013 10:58, Peter Gronbech wrote:
>>> Hi Alessandra et al,
>>> These tests have been a little unreliable in the past but I think
>>> Chris's explanation of how they represent jobs from smaller VOs points
>>> out that they can provide useful data.
>>> I am curious as to why some sites attract more of these jobs than others.
>>>
>>> RALPP, Manchester, Oxford account for 85% of the jobs.
>>>
>>> Is this because we have more ce's? (Oxford has 3), Ralpp has 3,
>>> Manchester has 3 but only 2 in production.
>>>
>>> I then wondered why although Oxford and RALPP are doing well (97%)
>>> success, why do we fail some times.
>>> All the errors from our sites are down to a failed lcg-cp which Kashif
>>> believes is down to a time out from the top level bdii at RAL. Chris W
>>> has already opened a ticket about this.
>>>
>>> However the errors at Manchester seem to be down to a CVMFS issue on
>>> wn2206180 as they error message is
>>> Trying to source:
>>> /cvmfs/atlas.cern.ch/repo/sw/software/x86_64-slc5-gcc43-opt/17.6.0/cmtsite/asetup.sh
>>> AtlasOffline 17.6.0
>>> Failed to find asetup.sh
>>>
>>> Alessandra, can you check if this is the case, if so your score would
>>> probably go to 100%. The question is why don't you see the lcg-cp
>>> error. Are you using your own top bdii?
>>>
>>> Thanks Pete
>>>
>>
--
Facts aren't facts if they come from the wrong people. (Paul Krugman)
|