Ah, I¡¯d noticed that you were only sending SAM tests to
heplnv146.pp.rl.ac.uk which is playing up a little at the moment (hence
the downtime) could you add heplnv147.pp.rl.ac.uk to the mix as well?
Thanks,
Chris.
On 18/06/2014 09:46, "Elena Korolkova" <[log in to unmask]> wrote:
>Hi Chris,
>
>no. This is a different issue. It¡¯s because of the DT:
>
>
>ATLAS Distributed Computing Site Status Board 2 <[log in to unmask]>,
>[log in to unmask]
>Switcher2 AutoExclusion] Summary for UK cloud at 2014-06-16 09:00 UTC
>
>Dear UK Cloud Support,
> please note that following PanDA Site IDs have been excluded/recovered
>by the AutoExclusion tool:
>
>
>
> Site UKI-SOUTHGRID-RALPP:
>
> SiteID: ANALY_RALPP_SL6
> Status changed to 'setoffline'
> Reason:
> UNSCHEDULED downtime
> Endpoint:heplnv146.pp.rl.ac.uk:2811
> Description:Investigating problems with htcondor on the node
> From:2014-06-16 09:00:00
> To:2014-06-20 07:00:00
> https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=14628
>
>
> SiteID: UKI-SOUTHGRID-RALPP_SL6
> Status changed to 'setoffline'
> Reason:
> UNSCHEDULED downtime
> Endpoint:heplnv146.pp.rl.ac.uk:2811
> Description:Investigating problems with htcondor on the node
> From:2014-06-16 09:00:00
> To:2014-06-20 07:00:00
> https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=14628
>
>
>
> PanDA Site ID recovery information:
> Recovery will be performed by the HammerCloud PFT/AFT. Once recovery
>notification is sent you may check progress of test jobs at [1] (PFT) or
>[2] (AFT). Summary of HC exclusion/recovery actions is at [3].
>
> [1]
>http://panda.cern.ch/server/pandamon/query?jobsummary=cloud&select=ptest&h
>ours=4&processingType=gangarobot-pft
> [2]
>http://panda.cern.ch/server/pandamon/query?jobsummary=cloud&select=ptest&h
>ours=3&processingType=gangarobot&dash=analysis
> [3] http://hammercloud.cern.ch/hc/app/atlas/robot/incidents/
>
>
>Somehow it¡¯s the only ce in RALPP queue configuration.
>
>I can add additional de¡¯s. Let me know which.
>
>Thanks
>Elena
>
>On 18 Jun 2014, at 09:25, Chris Brew <[log in to unmask]> wrote:
>
>> Hi Elena,
>>
>> The RALPP queues appears to be offline ©øset.offline.by.Switcher©÷ is that
>> part of the same issue or is there another problem I©öm not aware of?
>>
>> Thanks,
>> Chris.
>>
>> On 18/06/2014 09:01, "Elena Korolkova" <[log in to unmask]>
>>wrote:
>>
>>> Morning,
>>>
>>> the problem which I reported at yesterday meeting is still there.
>>>
>>> There is a low number of atlas jobs running in UK (and other) clouds.
>>> This is related to DDM problem (input files for jobs can©öt be
>>> transferred). ATLAS experts are working to solve it.
>>>
>>>
>>> Elena
>>
>> --
>> Scanned by iCritical.
|