I'm doing ATLAS shift . I haven't seen many failures in UK related to bdii
Dr Elena Korolkova
Email: [log in to unmask]
Tel.: +44 (0)114 2223553
Fax: +44 (0)114 2223555
Department of Physics and Astronomy
University of Sheffield
Sheffield, S3 7RH, United Kingdom
On Sat, 12 Jun 2010, John Gordon wrote:
> Jeremy, I do not believe that UK ops people can submit alarm tickets.
> Technically it just needs DNs adding to a list but whether they should
> or not is another matter. Keeping the experiment data flows going from
> CERN to a T1 is the justification for getting someone at the T1 out of
> bed. Is a UK site failing a SAM test of the same magnitude?
> As we have heard from Catalin, the T1 was called out so the T1
> monitoring seems up to be working.
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of J Coles
> Sent: 12 June 2010 17:11
> To: [log in to unmask]
> Subject: Re: lcg-bdii.gridpp.ac.uk problem?
> Hi Elena
> That supports the suspicion of lcg-bdii.gridpp.ac.uk I suspect IC and
> RALPP (and probably Bristol) were already set to look at alternate
> BDIIs. The Glasgow BDII has recovered from whatever problem it
> suffered earlier and now the Glasgow site is also passing again. Since
> the LHC VO results are still fine, I only created a GGUS ticket with
> top-priority (https://gus.fzk.de/ws/ticket_info.php?ticket=58990) - it
> raises a question as to whether our ops people can anyway submit alarm
> tickets to the T1 like the experiment ops people. I thought the T1
> triggered a call out after 2 successive ops VO failures anyway and
> since they are affected too....Something to discuss next week.
> On 12 Jun 2010, at 16:27, Elena Korolkova wrote:
>> Hi Jeremy
>> I just changed LCG_GFAL_INFOSYS to "bdii.ce-egee.org and we passed
>> the last SAM test.
>> Dr Elena Korolkova
>> Email: [log in to unmask]
>> Tel.: +44 (0)114 2223553
>> Fax: +44 (0)114 2223555
>> Department of Physics and Astronomy
>> University of Sheffield
>> Sheffield, S3 7RH, United Kingdom
>> On Sat, 12 Jun 2010, J Coles wrote:
>>> Hi Wahid
>>> The history here shows problems for the Glasgow BDII but not
>>> : http://pprc.qmul.ac.uk/~lloyd/gridpp/bdiitest.html.
>>> This view (from gstat2 that everyone at HEPSYSMAN yesterday will
>>> know about):
>>> also shows things to be okay (for now at least).
>>> There are some sites passing:
>>> (i.e. IC ... RALPP) . All others fail with ERROR: CE-sft-lcg-rm-
>>> rep with
>>> CRITICAL: METRIC FAILED [org.sam.WN-RepRep-/ops/Role=lcgadmin]:
>>> CRITICAL: File was NOT replicated to SE samdpm002.cern.ch. [ErrDB:
>>> [('lcg_util_wn', 'server', 'CRITICAL')]]
>>> Since other countries do not see the problem I tend to agree that
>>> it suggests a core UK problem, but the monitoring results are not
>>> clear (for me at least). How come ralpp and IC continue to pass the
>>> org.sam.WN-Rep-/ops/Role=lcgadmin service test? Perhaps one of the
>>> on-duty people can comment as I must be missing something.
>>> On 12 Jun 2010, at 09:42, Wahid Bhimji wrote:
>>>> Looks like a number of sites are failing sam tests due to a
>>>> problem with lcg-bdii.gridpp.ac.uk.
>>>> Could someone take a look
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.