Hi Winnie, You might also want to look at https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes which Steve Jones put together to give guidance on this topic. Cheers, John On 09/05/2014 14:12, Daniela Bauer wrote: > Hi Winnie, > > there's a file on the CEs called cluster.state and at least for the our > setup (i.e. with SGE) setting this to Draining will do the job. At the > same time, you need to declare a downtime for the CEs in the GOCDB. > > After you done both (and the downtime has started), you kill all jobs in > the queue. The LHC experiments will automatically resubmit elsewhere, > for ILC you might want to consider asking. > > Then just wait until the rest of the jobs finish, and you are home free. > > As for nagios testing of the site-bdii, that's a bit tricky, as by > construction there's only one site bdii, so a spare one doesn't quite > make sense, plus at least some of the security tests only run once or > twice a day, so you might be waiting for a long time. Just stick it in > and hope for the best :-D > > Cheers, > Daniela > > > On 9 May 2014 14:05, Winnie Lacesso <[log in to unmask] > <mailto:[log in to unmask]>> wrote: > > Happy Friday! > > Seeking more advice. > > We have 2 CREAM-CEs that need to drain of jobs & be emi-3 SL6 rebuilt > (yes, late, sorry sorry) > > Both have hundreds of jobs queued on them in long/med queues. > > 1. So first is to disable long/med submission & see if the queued jobs > will run, finish, thus drain the CE of long/med jobs (hopefully) "fast > enough". > > Is disabling long/med job submission advertised in the GOC-DB by > changing > CE status from PRODUCTION to, erm, NOT? (appears to be YES/NO only!) > > Or, yaim-conf/services/glite-creamce on each contains > CREAM_CE_STATE=Production > Without having to rerun yaim, that value appears to be in > /etc/glite-ce-glue2/glite-ce-glue2.conf > and /var/lib/bdii/gip/ldif/static-file-CE.ldif (once per queue). > Happy to change these by hand, the files are not dynamically generated. > > So, is a valid value "DRAINING"? - think so, I recall lcg-rollout > Jan 2014 > mentioned this: > > Hi Lukasz, > > I see that the queues are publishing the following value: > > GlueCEStateStatus: Draining > Question is, will CMS & LHCb job submission frameworks automatically > note > that & not try, or will they try, complain / ticket us, till it's > pointed > out... (maybe should contact them...) > > > 2. It may take too long to allow the queued jobs to finish (want to get > the upgrade done ASAP). It is better / acceptable to contact the VOs > with > many queued jobs (ILC, CMS & LHCb) & ask them to cancel the hundreds of > queued jobs? > > Always grateful for advice! > > Winnie Lacesso / Linux & Solaris Systems Administrator > HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK > University of Bristol > > > > > -- > Sent from the pit of despair > > ----------------------------------------------------------- > [log in to unmask] <mailto:[log in to unmask]> > HEP Group/Physics Dep > Imperial College > London, SW7 2BW > Tel: +44-(0)20-75947810 > http://www.hep.ph.ic.ac.uk/~dbauer/