Print

Print


Hi,

Some days before, we faced a reverse problem, especially we have 100%utilization and ops jobs was not executed. So we made some changes.

we made some changes to  maui.cfg file (/var/spool/maui/).we define one 
workernode (eg wn0xx.sitenmame) to be dedicated for ops job. Add to 
maui.cfg

 SRCFG[sftcpu]         PERIOD=INFINITY
 SRCFG[sftcpu]         TASKCOUNT=1
SRCFG[sftcpu]         CLASSLIST=dteam-,ops-,seeops-
SRCFG[sftcpu]         GROUPLIST=seesgm-
SRCFG[sftcpu]         HOSTLIST=wn0XX.kallisto.hellasgrid.gr
 
Finally it was added one more jobslot to the defined workernode (changing /var/spool/pbs/server_priv/nodes from np=2 to np=4 ) 

Finally, the problem is same for all jobs from all VOs.

Thanks

Konstantinos Koumoutsos
HG04-CTI-CEID

-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On Behalf Of Arnau Bria
Sent: Tuesday, June 28, 2011 5:35 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job not running although there are free CPUs

On Tue, 28 Jun 2011 14:16:37 +0000
Gkamas Vasilis wrote:

> Hi,
Hi,
 
if qrun runs the job, it looks like a maui problem.

have you changed any torque/maui conf recently? Or added some requirements to jobs via torque_submit_filter, i.e? (like ncpus, whatever?).

> [root@ce01 ~]# checkjob 390370

is there any problem with ops queue? is it started and active? With a valid conf?
do you see same problem if the job is sent to another queue?

(qmgrc -c "p q ops")

any other clue with checkjob -v or maui logs?

> 
> 
> Thank you,
> Vasilis
Cheers,
Arnau 
> 
> 
> P.S. The previous jobs was manually run