JISCMail - LCG-ROLLOUT Archives

Hi Martin,

I cut and paste your suggestion in wiki. I hope you don't mind.

cheers
alessandra

On Tue, 28 Jun 2005, Bly, MJ (Martin) wrote:

> The drain time for a farm is too long to be considered for anything but
> the most dire interventions.
>
> You don't need to drain the queues, just temporarily stop the jobs.  For
> PBS/torque:
>
> 	qsig -s STOP `qselect -q whetever -s R`
>
> reboot, and restart the jobs
>
> 	qsig -s CONT `qselect -q whatever -s R`
>
>
> Martin.
>
>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout
>> [mailto:[log in to unmask]] On Behalf Of
>> Maarten Litmaath
>> Sent: 27 June 2005 16:17
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] rebooting a CE
>>
>>
>> Jeff Templon wrote:
>>
>>> Hi *,
>>>
>>> We need to reboot our CE soon (kernel upgrade).  Used to be if you
>>> rebooted a CE machine, condor-G on the WMS would decide
>> that your jobs
>>> must all be dead, and restart them elsewhere.
>>
>> That is a long time ago.  These days jobs in steady state are not
>> affected by a reboot of the CE or the RB.  Jobs in transit (e.g.
>> just finishing) will fail.
>>
>>> What is the situation now?  Do we need to drain queues
>> before rebooting?
>>
>> Draining the queues is always a good idea.
>>
>

-- 
********************************************
* Dr Alessandra Forti			   *
* Technical Coordinator - NorthGrid Tier2  *
* http://www.hep.man.ac.uk/u/aforti	   *
********************************************