Hi Winnie,
You might also want to look at
https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes which Steve Jones put
together to give guidance on this topic.
Cheers,
John
On 09/05/2014 14:12, Daniela Bauer wrote:
> Hi Winnie,
>
> there's a file on the CEs called cluster.state and at least for the our
> setup (i.e. with SGE) setting this to Draining will do the job. At the
> same time, you need to declare a downtime for the CEs in the GOCDB.
>
> After you done both (and the downtime has started), you kill all jobs in
> the queue. The LHC experiments will automatically resubmit elsewhere,
> for ILC you might want to consider asking.
>
> Then just wait until the rest of the jobs finish, and you are home free.
>
> As for nagios testing of the site-bdii, that's a bit tricky, as by
> construction there's only one site bdii, so a spare one doesn't quite
> make sense, plus at least some of the security tests only run once or
> twice a day, so you might be waiting for a long time. Just stick it in
> and hope for the best :-D
>
> Cheers,
> Daniela
>
>
> On 9 May 2014 14:05, Winnie Lacesso <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
> Happy Friday!
>
> Seeking more advice.
>
> We have 2 CREAM-CEs that need to drain of jobs & be emi-3 SL6 rebuilt
> (yes, late, sorry sorry)
>
> Both have hundreds of jobs queued on them in long/med queues.
>
> 1. So first is to disable long/med submission & see if the queued jobs
> will run, finish, thus drain the CE of long/med jobs (hopefully) "fast
> enough".
>
> Is disabling long/med job submission advertised in the GOC-DB by
> changing
> CE status from PRODUCTION to, erm, NOT? (appears to be YES/NO only!)
>
> Or, yaim-conf/services/glite-creamce on each contains
> CREAM_CE_STATE=Production
> Without having to rerun yaim, that value appears to be in
> /etc/glite-ce-glue2/glite-ce-glue2.conf
> and /var/lib/bdii/gip/ldif/static-file-CE.ldif (once per queue).
> Happy to change these by hand, the files are not dynamically generated.
>
> So, is a valid value "DRAINING"? - think so, I recall lcg-rollout
> Jan 2014
> mentioned this:
> > Hi Lukasz,
> > I see that the queues are publishing the following value:
> > GlueCEStateStatus: Draining
> Question is, will CMS & LHCb job submission frameworks automatically
> note
> that & not try, or will they try, complain / ticket us, till it's
> pointed
> out... (maybe should contact them...)
>
>
> 2. It may take too long to allow the queued jobs to finish (want to get
> the upgrade done ASAP). It is better / acceptable to contact the VOs
> with
> many queued jobs (ILC, CMS & LHCb) & ask them to cancel the hundreds of
> queued jobs?
>
> Always grateful for advice!
>
> Winnie Lacesso / Linux & Solaris Systems Administrator
> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
> University of Bristol
>
>
>
>
> --
> Sent from the pit of despair
>
> -----------------------------------------------------------
> [log in to unmask] <mailto:[log in to unmask]>
> HEP Group/Physics Dep
> Imperial College
> London, SW7 2BW
> Tel: +44-(0)20-75947810
> http://www.hep.ph.ic.ac.uk/~dbauer/
|