Hi,
Some days before, we faced a reverse problem, especially we have 100%utilization and ops jobs was not executed. So we made some changes.
we made some changes to maui.cfg file (/var/spool/maui/).we define one
workernode (eg wn0xx.sitenmame) to be dedicated for ops job. Add to
maui.cfg
SRCFG[sftcpu] PERIOD=INFINITY
SRCFG[sftcpu] TASKCOUNT=1
SRCFG[sftcpu] CLASSLIST=dteam-,ops-,seeops-
SRCFG[sftcpu] GROUPLIST=seesgm-
SRCFG[sftcpu] HOSTLIST=wn0XX.kallisto.hellasgrid.gr
Finally it was added one more jobslot to the defined workernode (changing /var/spool/pbs/server_priv/nodes from np=2 to np=4 )
Finally, the problem is same for all jobs from all VOs.
Thanks
Konstantinos Koumoutsos
HG04-CTI-CEID
-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On Behalf Of Arnau Bria
Sent: Tuesday, June 28, 2011 5:35 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job not running although there are free CPUs
On Tue, 28 Jun 2011 14:16:37 +0000
Gkamas Vasilis wrote:
> Hi,
Hi,
if qrun runs the job, it looks like a maui problem.
have you changed any torque/maui conf recently? Or added some requirements to jobs via torque_submit_filter, i.e? (like ncpus, whatever?).
> [root@ce01 ~]# checkjob 390370
is there any problem with ops queue? is it started and active? With a valid conf?
do you see same problem if the job is sent to another queue?
(qmgrc -c "p q ops")
any other clue with checkjob -v or maui logs?
>
>
> Thank you,
> Vasilis
Cheers,
Arnau
>
>
> P.S. The previous jobs was manually run
|