Jeremy, I do not believe that UK ops people can submit alarm tickets.
Technically it just needs DNs adding to a list but whether they should
or not is another matter. Keeping the experiment data flows going from
CERN to a T1 is the justification for getting someone at the T1 out of
bed. Is a UK site failing a SAM test of the same magnitude?
As we have heard from Catalin, the T1 was called out so the T1
monitoring seems up to be working.
Regards,
John
-----Original Message-----
From: Testbed Support for GridPP member institutes
[mailto:[log in to unmask]] On Behalf Of J Coles
Sent: 12 June 2010 17:11
To: [log in to unmask]
Subject: Re: lcg-bdii.gridpp.ac.uk problem?
Hi Elena
That supports the suspicion of lcg-bdii.gridpp.ac.uk I suspect IC and
RALPP (and probably Bristol) were already set to look at alternate
BDIIs. The Glasgow BDII has recovered from whatever problem it
suffered earlier and now the Glasgow site is also passing again. Since
the LHC VO results are still fine, I only created a GGUS ticket with
top-priority (https://gus.fzk.de/ws/ticket_info.php?ticket=58990) - it
raises a question as to whether our ops people can anyway submit alarm
tickets to the T1 like the experiment ops people. I thought the T1
triggered a call out after 2 successive ops VO failures anyway and
since they are affected too....Something to discuss next week.
Cheers,
Jeremy
On 12 Jun 2010, at 16:27, Elena Korolkova wrote:
> Hi Jeremy
>
> I just changed LCG_GFAL_INFOSYS to "bdii.ce-egee.org and we passed
> the last SAM test.
>
> Elena
>
>
________________________________________________________________________
____
> Dr Elena Korolkova
> Email: [log in to unmask]
> Tel.: +44 (0)114 2223553
> Fax: +44 (0)114 2223555
> Department of Physics and Astronomy
> University of Sheffield
> Sheffield, S3 7RH, United Kingdom
>
> On Sat, 12 Jun 2010, J Coles wrote:
>
>> Hi Wahid
>>
>> The history here shows problems for the Glasgow BDII but not
lcg-bdii.gridpp.ac.uk
>> : http://pprc.qmul.ac.uk/~lloyd/gridpp/bdiitest.html.
>>
>> This view (from gstat2 that everyone at HEPSYSMAN yesterday will
>> know about):
http://gstat-prod.cern.ch/gstat/service/bdii_top/treeview/lcg-bdii.gridp
p.ac.uk/
>> also shows things to be okay (for now at least).
>>
>> There are some sites passing:
http://pprc.qmul.ac.uk/~lloyd/gridpp/samtest.html
>> (i.e. IC ... RALPP) . All others fail with ERROR: CE-sft-lcg-rm-
>> rep with
>>
>> CRITICAL: METRIC FAILED [org.sam.WN-RepRep-/ops/Role=lcgadmin]:
>> CRITICAL: File was NOT replicated to SE samdpm002.cern.ch. [ErrDB:
>> [('lcg_util_wn', 'server', 'CRITICAL')]]
>> org.sam.WN-RepCr-/ops/Role=lcgadmin
>>
>> Since other countries do not see the problem I tend to agree that
>> it suggests a core UK problem, but the monitoring results are not
>> clear (for me at least). How come ralpp and IC continue to pass the
>> org.sam.WN-Rep-/ops/Role=lcgadmin service test? Perhaps one of the
>> on-duty people can comment as I must be missing something.
>>
>> Jeremy
>>
>>
>>
>>
>>
>> On 12 Jun 2010, at 09:42, Wahid Bhimji wrote:
>>
>>> Hi
>>> Looks like a number of sites are failing sam tests due to a
>>> problem with lcg-bdii.gridpp.ac.uk.
>>> Could someone take a look
>>> Ta
>>> Wahid
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
|