Steve,
For an insight into atlas please see here and click on the 'fault' item:
http://apfmon.lancs.ac.uk/q/UKI-NORTHGRID-LIV-HEP_SL6
026 (969175.000.000) 07/13 07:23:09 Detected Down Grid Resource
GridResource: nordugrid hepgrid2.ph.liv.ac.uk
But also for your cream ce:
009 (13085502.000.000) 07/13 12:36:23 Job was aborted by the user.
CREAM error: CREAM_JOB_REGISTER timed out
Network issue?
Cheers,
Peter
> On 13 Jul 2016, at 11:54, Gordon Stewart <[log in to unmask]> wrote:
>
> Hi Steve,
>
> I'm on duty this week, and I don't recall seeing any alarms for Liverpool yesterday, and certainly nothing which persisted long enough for me to think about notifying / ticketing. The vast majority of current alarms are related to the GridPP Nagios instances going away.
>
>
> Gordon
>
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of Daniela Bauer
> Sent: 13 July 2016 11:40
> To: [log in to unmask]
> Subject: Re: Arc Outage
>
> Hi Steve,
>
> There has been a problem with alarms not going to the dashboard, and even though the dashboard people now claim it's fixed, I get the impression that alarms that didn't reach the dashboard while it was broken are not picked up now either, only new alarms are. You (plural
> you) might just want to check your site on http://argo.egi.eu to see if there are any residual issues you are not aware of.
>
> Cheers,
> Daniela
>
>
> On 13 July 2016 at 11:28, Stephen Jones <[log in to unmask]> wrote:
>> Hi Kashif,
>>
>> (note to Raj and Alessandra below)
>>
>> I was sitting here wondering why there are so few ATLAS of LHCB jobs
>> coming to our ARC/Condor cluster at Liverpool.
>>
>> So I had a dig about, and restarted the services on the ARC/Condor
>> headnode (for no real reason) at 12th July, 16:30.
>>
>> Today, there are still no jobs from those (although SNOPLUS and NA62
>> are getting plenty of run time! )
>>
>> So I had a look at this new website you mentioned, http://argo.egi.eu
>>
>> I can see from the plot (which is attached) that our ARC Server was
>> PURPLE until 16:30 yesterday, then it went GREEN.
>>
>> This happened at the time I restarted the services, so I assume
>> something must have been stuck.
>>
>> Anyway, was this problem seen on the Dashboard? Could I have been notified?
>>
>> Also, for Raj and Alessandra: are ATLAS and LHCB hooked up to this
>> alarm system? Have they stopped sending jobs here? When will the resume?
>>
>> Cheers for all your help,
>>
>> Ste
>>
>> Liverpool
>>
>>
>>
>>
>>
>>
>> --
>> Steve Jones [log in to unmask]
>> Grid System Administrator office: 220
>> High Energy Physics Division tel (int): 43396
>> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
>> University of Liverpool http://www.liv.ac.uk/physics/hep/
>>
>
>
>
> --
> Sent from the pit of despair
>
> -----------------------------------------------------------
> [log in to unmask]
> HEP Group/Physics Dep
> Imperial College
> London, SW7 2BW
> Tel: +44-(0)20-75947810
> http://www.hep.ph.ic.ac.uk/~dbauer/
|