On 23/08/11 09:58, Santanu Das wrote:
> Apology for not being able to make it today.
> Just a brief update: upgraded 5 of the disk-servers to SL5 and having some post-update issues. Work is in progress.
>
"QMUL, RHUL and ECDF had few Nagios failures. In RHUL, it was to due
reinstallation of a few WN and in QMUL it was due to black holing of one
of the WNs."
In QMUL some of the worker nodes blackholed. Once fixed, our CREAM CE
sulked for a while, then I rebooted the machine several times, reyaimed
it, and finally gave up and went to bed. Overnight, it fixed itself.
What the problem is, I don't know, but there's clearly a problem with
CREAM here too. In addition, we currently keep old jobs around for 5.7
days rather than the default 57 - as otherwise we get failures.
I have wondered about installing an emi cream, but
http://nationalgridservice.blogspot.com/2011/08/good-enough-impression.html
implies that the emi cream doesn't support sge...
Chris
> Cheers,
> Santanu
>
>
>
> On 23 Aug 2011, at 09:43, Jeremy Coles<[log in to unmask]> wrote:
>
>> Dear All
>>
>> The outline agenda for today's meeting is here: http://indico.cern.ch/conferenceDisplay.py?confId=151452. There are no special topics for discussion this week so we can have a more open discussion to get the latest news and issues from each site.
>>
>> For minutes we have: Ewan=1 Alessandra=1 Stephen=1 Catalin=1 Chris=1 Matt=2 Daniela=2 Mark=2 Sam=2 Wahid=2 David=2 Duncan=2 Stuart=2
>>
>> Thanks,
>> Jeremy
|