Marcin Radecki wrote:
> Burke, S (Stephen) napisał(a):
>> LHC Computer Grid - Rollout
>>> [mailto:[log in to unmask]] On Behalf Of Marcin Radecki said:
>>> Regarding the opposite i.e. when there is a real CS failure which
>>> causes sites marked as bad the consequences seems more severe.
>>> Normally CS failure affects several sites,
>>
>> Can't you use that as an indicator in itself, i.e. if many sites see the
>> same error it's probably CS, if only one site sees it it probably isn't?
>
> Good direction. However, probably not suitable for improving individual
> SAM test as it would require asking SAM database for others' results
> (very high load on the DB). As an option such analysis could be done
> centrally and then other tools like weekly report generator could remove
> failures which happened during CS failure periods.
>
> Marcin
Hi Marcin and all,
in the case of replication tests done through CE submission, the TOP
BDII used in the farm is *known* and the SAM test itself could do a
query to that particular Core Service SAM results:
- if CS test was FAILED: try anyway and if Rep fails write CSFAIL
- if CS test was GOOD proceed as usual (OK or FAIL)
- if CS test Not Available, shout it very loud, because CS *must* be
controlled, registered in GOCDB, etc...
Cheers,
Alessandro
--
Alessandro Cavalli
INFN - CNAF
Viale Berti Pichat 6/2
40127 Bologna
Italy
tel: +39 051 6092849
fax: +39 051 6092746
ICQ: 12771368
MSN: [log in to unmask]
Skype name: alessandro.cavalli
|