On 09/05/14 14:16, John Hill wrote:
> Hi Winnie,
> You might also want to look at
> https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes which Steve Jones
> put together to give guidance on this topic.
Too late, I suspect, but will an EMI-2 -> EMI-3 upgrade and reyaim
actually kill queued jobs?
I do note you play to rebuild the nodes, so this may not be possible for
you.
Chris
> Cheers,
> John
>
> On 09/05/2014 14:12, Daniela Bauer wrote:
>> Hi Winnie,
>>
>> there's a file on the CEs called cluster.state and at least for the our
>> setup (i.e. with SGE) setting this to Draining will do the job. At the
>> same time, you need to declare a downtime for the CEs in the GOCDB.
>>
>> After you done both (and the downtime has started), you kill all jobs in
>> the queue. The LHC experiments will automatically resubmit elsewhere,
>> for ILC you might want to consider asking.
>>
>> Then just wait until the rest of the jobs finish, and you are home free.
>>
>> As for nagios testing of the site-bdii, that's a bit tricky, as by
>> construction there's only one site bdii, so a spare one doesn't quite
>> make sense, plus at least some of the security tests only run once or
>> twice a day, so you might be waiting for a long time. Just stick it in
>> and hope for the best :-D
>>
>> Cheers,
>> Daniela
>>
>>
>> On 9 May 2014 14:05, Winnie Lacesso <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>> Happy Friday!
>>
>> Seeking more advice.
>>
>> We have 2 CREAM-CEs that need to drain of jobs & be emi-3 SL6
>> rebuilt
>> (yes, late, sorry sorry)
>>
>> Both have hundreds of jobs queued on them in long/med queues.
>>
>> 1. So first is to disable long/med submission & see if the queued
>> jobs
>> will run, finish, thus drain the CE of long/med jobs (hopefully)
>> "fast
>> enough".
>>
>> Is disabling long/med job submission advertised in the GOC-DB by
>> changing
>> CE status from PRODUCTION to, erm, NOT? (appears to be YES/NO only!)
>>
>> Or, yaim-conf/services/glite-creamce on each contains
>> CREAM_CE_STATE=Production
>> Without having to rerun yaim, that value appears to be in
>> /etc/glite-ce-glue2/glite-ce-glue2.conf
>> and /var/lib/bdii/gip/ldif/static-file-CE.ldif (once per queue).
>> Happy to change these by hand, the files are not dynamically
>> generated.
>>
>> So, is a valid value "DRAINING"? - think so, I recall lcg-rollout
>> Jan 2014
>> mentioned this:
>> > Hi Lukasz,
>> > I see that the queues are publishing the following value:
>> > GlueCEStateStatus: Draining
>> Question is, will CMS & LHCb job submission frameworks automatically
>> note
>> that & not try, or will they try, complain / ticket us, till it's
>> pointed
>> out... (maybe should contact them...)
>>
>>
>> 2. It may take too long to allow the queued jobs to finish (want
>> to get
>> the upgrade done ASAP). It is better / acceptable to contact the VOs
>> with
>> many queued jobs (ILC, CMS & LHCb) & ask them to cancel the
>> hundreds of
>> queued jobs?
>>
>> Always grateful for advice!
>>
>> Winnie Lacesso / Linux & Solaris Systems Administrator
>> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
>> University of Bristol
>>
>>
>>
>>
>> --
>> Sent from the pit of despair
>>
>> -----------------------------------------------------------
>> [log in to unmask] <mailto:[log in to unmask]>
>> HEP Group/Physics Dep
>> Imperial College
>> London, SW7 2BW
>> Tel: +44-(0)20-75947810
>> http://www.hep.ph.ic.ac.uk/~dbauer/
|