Print

Print


Hi Winnie,
    You might also want to look at 
https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes which Steve Jones put 
together to give guidance on this topic.
Cheers,
John

On 09/05/2014 14:12, Daniela Bauer wrote:
> Hi Winnie,
>
> there's a file on the CEs called cluster.state and at least for the our
> setup (i.e. with SGE) setting this to Draining will do the job. At the
> same time, you need to declare a downtime for the CEs in the GOCDB.
>
> After you done both (and the downtime has started), you kill all jobs in
> the queue. The LHC experiments will automatically resubmit elsewhere,
> for ILC you might want to consider asking.
>
> Then just wait until the rest of the jobs finish, and you are home free.
>
> As for nagios testing of the site-bdii, that's a bit tricky, as by
> construction there's only one site bdii, so  a spare one doesn't quite
> make sense, plus at least some of the security tests only run once or
> twice a day, so you might be waiting for a long time. Just stick it in
> and hope for the best :-D
>
> Cheers,
> Daniela
>
>
> On 9 May 2014 14:05, Winnie Lacesso <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
>     Happy Friday!
>
>     Seeking more advice.
>
>     We have 2 CREAM-CEs that need to drain of jobs & be emi-3 SL6 rebuilt
>     (yes, late, sorry sorry)
>
>     Both have hundreds of jobs queued on them in long/med queues.
>
>     1. So first is to disable long/med submission & see if the queued jobs
>     will run, finish, thus drain the CE of long/med jobs (hopefully) "fast
>     enough".
>
>     Is disabling long/med job submission advertised in the GOC-DB by
>     changing
>     CE status from PRODUCTION to, erm, NOT? (appears to be YES/NO only!)
>
>     Or, yaim-conf/services/glite-creamce on each contains
>     CREAM_CE_STATE=Production
>     Without having to rerun yaim, that value appears to be in
>     /etc/glite-ce-glue2/glite-ce-glue2.conf
>     and /var/lib/bdii/gip/ldif/static-file-CE.ldif (once per queue).
>     Happy to change these by hand, the files are not dynamically generated.
>
>     So, is a valid value "DRAINING"? - think so, I recall lcg-rollout
>     Jan 2014
>     mentioned this:
>      > Hi Lukasz,
>      > I see that the queues are publishing the following value:
>      > GlueCEStateStatus: Draining
>     Question is, will CMS & LHCb job submission frameworks automatically
>     note
>     that & not try, or will they try, complain / ticket us, till it's
>     pointed
>     out... (maybe should contact them...)
>
>
>     2. It may take too long to allow the queued jobs to finish (want to get
>     the upgrade done ASAP). It is better / acceptable to contact the VOs
>     with
>     many queued jobs (ILC, CMS & LHCb) & ask them to cancel the hundreds of
>     queued jobs?
>
>     Always grateful for advice!
>
>     Winnie Lacesso / Linux & Solaris Systems Administrator
>     HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
>     University of Bristol
>
>
>
>
> --
> Sent from the pit of despair
>
> -----------------------------------------------------------
> [log in to unmask] <mailto:[log in to unmask]>
> HEP Group/Physics Dep
> Imperial College
> London, SW7 2BW
> Tel: +44-(0)20-75947810
> http://www.hep.ph.ic.ac.uk/~dbauer/