Hi Juan,
have you noticed this:
> I can see:
> # checking job 423
>
> State: Idle
> Creds: user:dteam27 group:dteam account:/C=ES/O=DATAGRID-
> ES/O=UAM/CN=Juan Jose Pardo Navarro DTEAM/CN=p
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For some CMS job I have
State: Running
Creds: user:cms001 group:cms account:all class:cms qos:DEFAULT
WallTime: 1:04:08:38 of 20:20:00:00
SubmitTime: Mon Nov 28 10:38:53
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
Can you check again you maui.cfg, and restart maui and pbs_server...
Regards, Antun
> class:dteam qos:DEFAULT
> WallTime: 00:00:00 of 4:03:00:00
> SubmitTime: Mon Nov 28 17:19:23
> (Time Queued Total: 21:11:06 Eligible: 00:00:00)
>
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Holds: Batch
> Messages: exceeds available partition procs
> PE: 1.00 StartPriority: 1
> cannot select job 423 for partition DEFAULT (job hold active)
>
> # showq
>
> 423 dteam27 BatchHold 1 4:03:00:00 Mon Nov 28 17:19:23
>
> El mar, 29-11-2005 a las 13:53 +0100, Antun Balaz escribió:
> > Hi Juan,
> > it seems that maui is not able to detect your nodes... What is the
output
> > of 'pbsnodes -a'? Is torque server on the same machine as CE? Any other
> > ideas?
> >
> > Regards, Antun
> >
> > -----
> > E-mail: [log in to unmask]
> > Web: http://scl.phy.bg.ac.yu/
> >
> > Phone: +381 11 3160260, Ext. 152
> > Fax: +381 11 3162190
> >
> > Scientific Computing Laboratory
> > Institute of Physics, Belgrade
> > Serbia and Montenegro
> > -----
> >
> > ---------- Original Message -----------
> > From: Juan Jose Pardo Navarro <[log in to unmask]>
> > To: [log in to unmask]
> > Sent: Mon, 28 Nov 2005 15:01:38 +0100
> > Subject: Re: [LCG-ROLLOUT] torque + maui
> >
> > > Hi Antun,
> > >
> > > thanks for your answer. It is a good idea.
> > >
> > > I see (maui.log) that maui can see the free machines but :
> > >
> > > 11/26 12:30:42 INFO: 0 PBS resources detected on RM base
> > > 11/26 12:30:42 WARNING: no resources detected
> > > 11/26 12:30:42 MPBSWorkloadQuery(base,JCount,SC)
> > > 11/26 12:30:42 INFO: 28 PBS jobs detected on RM base
> > > 11/26 12:30:42 INFO: jobs detected: 28
> > > 11/26 12:30:42 INFO: total jobs selected (ALL): 0/28 [Hold: 28]
> > > 11/26 12:30:42 INFO: total jobs selected (ALL): 0/28 [Hold: 28]
> > >
> > > more details:
> > >
> > > .................................................
> > > ...............................................
> > > 11/26 12:30:42 INFO: PBS node node1.ft.uam.es set to state Idle
> > > (free)
> > > 11/26 12:30:42 MPBSLoadQueueInfo(base,node2.ft.uam.es,SC)
> > > 11/26 12:30:42 __MPBSGetNodeState(Name,State,PNode)
> > > 11/26 12:30:42 INFO: PBS node node3.ft.uam.es set to state Idle
> > > (free)
> > > 11/26 12:30:42 MPBSLoadQueueInfo(base,node3.ft.uam.es,SC)
> > > 11/26 12:30:42 __MPBSGetNodeState(Name,State,PNode)
> > > ...............................................
> > > .................................................
> > >
> > > 11/26 12:30:42 INFO: 0 PBS resources detected on RM base
> > > 11/26 12:30:42 WARNING: no resources detected
> > > 11/26 12:30:42 MPBSWorkloadQuery(base,JCount,SC)
> > > 11/26 12:30:42 INFO: 28 PBS jobs detected on RM base
> > > 11/26 12:30:42 INFO: jobs detected: 28
> > > 11/26 12:30:42 INFO: total jobs selected (ALL): 0/28 [Hold: 28]
> > > 11/26 12:30:42 INFO: total jobs selected (ALL): 0/28 [Hold: 28]
> > >
> > > thanks for all
> > >
> > > El dom, 27-11-2005 a las 19:14 +0100, Antun Balaz escribió:
> > > > Hi Juan,
> > > > according to your maui.cfg, dteam should get 1/130 share of your
> > resources.
> > > > If you have 20 nodes, do the math...
> > > >
> > > > I am not 100% sure that this causes the problem, but if it is, try
to
> > put
> > > > all queues except for dteam in one group, say 'all', and dteam queue
in
> > a
> > > > separate group, say 'reserved', adding ADEF=reserved and ADEF=all at
the
> > end
> > > > of appropriate lines in maui.cfg:
> > > >
> > > > GROUPCFG[dteam] FSTARGET=100 MAXPROC=10,20 ADEF=reserved
> > > > GROUPCFG[atlas] FSTARGET=30 MAXPROC=10,20 ADEF=all
> > > > GROUPCFG[gvmuam] FSTARGET=30 MAXPROC=10 MAXJOB=1 ADEF=all
> > > > GROUPCFG[zeus] FSTARGET=30 MAXPROC=10 MAXJOB=1 ADEF=all
> > > > GROUPCFG[short] FSTARGET=30 MAXPROC=10 MAXJOB=1 ADEF=all
> > > >
> > > > (I would also suggest you to remove GROUPCFG[DEFAULT] line if you do
not
> > use
> > > > it).
> > > >
> > > > After this, just add
> > > >
> > > > ACCOUNTCFG[reserved] FSTARGET=5 MAXPROC=19
> > > > ACCOUNTCFG[all] FSTARGET=95 MAXPROC=20
> > > >
> > > > Now, the FSTARGETs will apply only within the groups, i.e. dteam
will
> > have
> > > > 100% within the group 'reserved', which will get 5% of the overall
> > resources
> > > > (i.e. 1 node); real atlas FSTARGET will be 25% of 95% etc. You can
as
> > well
> > > > adjust FSTARGETs for the rest of the queues to 25 (so that they sum
up
> > to
> > > > 100), although it is not obligatory...
> > > >
> > > > Hope this helps,
> > > > Antun
> > > >
> > > > -----
> > > > E-mail: [log in to unmask]
> > > > Web: http://scl.phy.bg.ac.yu/
> > > >
> > > > Phone: +381 11 3160260, Ext. 152
> > > > Fax: +381 11 3162190
> > > >
> > > > Scientific Computing Laboratory
> > > > Institute of Physics, Belgrade
> > > > Serbia and Montenegro
> > > > -----
> > > >
> > > > ---------- Original Message -----------
> > > > From: Juan Jose Pardo Navarro <[log in to unmask]>
> > > > To: [log in to unmask]
> > > > Sent: Sun, 27 Nov 2005 12:20:13 +0100
> > > > Subject: [LCG-ROLLOUT] torque + maui
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have torque and maui and when I submit some jobs, all jobs
always are
> > > > > with queue status:
> > > > >
> > > > > Some details:
> > > > >
> > > > > A)
> > > > >
> > > > > # qstat -q
> > > > >
> > > > > Queue Memory CPU Time Walltime Node Run Que Lm State
> > > > > ---------------- ------ -------- -------- ---- --- --- -- -----
> > > > > dteam -- 02:00:00 99:00:00 -- 0 28 40 E R
> > > > > --- ---
> > > > > 0 28
> > > > >
> > > > > B)
> > > > >
> > > > > the configuration of server pbs and dteam is:
> > > > >
> > > > > create queue dteam
> > > > > set queue dteam queue_type = Execution
> > > > > set queue dteam max_running = 40
> > > > > set queue dteam resources_max.cput = 02:00:00
> > > > > set queue dteam resources_max.walltime = 99:00:00
> > > > > set queue dteam enabled = True
> > > > > set queue dteam started = True
> > > > > # Set server attributes.
> > > > > #
> > > > > set server scheduling = True
> > > > > set server acl_host_enable = False
> > > > > set server managers = root@serverpbs
> > > > > set server operators = root@serverpbs
> > > > > set server default_queue = dteam
> > > > > set server log_events = 511
> > > > > set server mail_from = adm
> > > > > set server query_other_jobs = True
> > > > > set server scheduler_iteration = 600
> > > > > set server node_ping_rate = 300
> > > > > set server node_check_rate = 600
> > > > > set server tcp_timeout = 6
> > > > > set server default_node = lcgpro
> > > > > set server node_pack = False
> > > > > set server job_stat_rate = 30
> > > > >
> > > > > C)
> > > > >
> > > > > I have 20 nodes:
> > > > >
> > > > > *************************************
> > > > >
> > > > > node1
> > > > > state = free
> > > > > np = 1
> > > > > properties = lcgpro
> > > > > ntype = cluster
> > > > > status = arch=linux,uname=Linux node0 2.4.21-32.0.1.EL #1 W
> > > > > ed May 25 16:02:04 CDT 2005 i686,sessions=? 0,nsessions=?
> > > > > 0,nusers=0,idletime=26
> > > > > 881,totmem=993140kb,availmem=875488kb,physmem=479068kb,ncpus=1,
> > > > > loadave=0.00,netl oad=120720384,state=free,rectime=1132633775
> > > > >
> > > > > node1
> > > > > state = free
> > > > > np = 1
> > > > > properties = lcgpro
> > > > > ntype = cluster
> > > > > status = arch=linux,uname=Linux node1 2.4.21-32.0.1.EL #1 W
> > > > > ed May 25 16:02:04 CDT 2005 i686,sessions=? 0,nsessions=?
> > > > > 0,nusers=0,idletime=26
> > > > > 881,totmem=479068kb,availmem=361576kb,physmem=479068kb,ncpus=1,
> > > > > loadave=0.00,netl
> > > >
> >
oad=120698957,state=free,rectime=1132633785 .................................
.......................
> > .......................
> > > > .....................
> > > > >
> > > > > D) I send /var/spool/maui/maui.cfg file
> > > > >
> > > > > E)
> > > > >
> > > > > #qstat -a
> > > > >
> > > > > [root@gridce01 root]# qstat -a
> > > > >
> > > > > gridce01.ft.uam.es:
> > > > > Req'd
> > Req'd
> > > > > Elap
> > > > > Job ID Username Queue Jobname SessID NDS TSK Memory
Time
> > > > > S Time
> > > > > --------------- -------- -------- ---------- ------ --- --- ------
----
> > -
> > > > > - -----
> > > > > 283.serverpbs dteam27 dteam STDIN -- 1 -- --
> > > > > 02:00 Q
> > > > > --
> > > > > 284.serverpbs dteam27 dteam STDIN -- 1 -- --
> > > > > 02:00 Q
> > > > > --
> > > > > 285.serverpbs dteam27 dteam STDIN -- 1 -- --
> > > > > 02:00 Q
> > > > > --
> > > > > 286.serverpbs dteam27 dteam STDIN -- 1 -- --
> > > > > 02:00 Q
> > > > > --
> > > > > 287.serverpbs dteam27 dteam STDIN -- 1 -- --
> > > > > 02:00 Q
> > > > > --
> > > > > .............................................................
> > > > > ...........................................................
> > > > >
> > > > > any idea ?
> > > > >
> > > > > --
> > > > >
> > ========================================================================
> > > > > Juan Jose Pardo Navarro e-mail: [log in to unmask]
> > > > > Dpto Fisica Teorica. C-XI.
> > > > > Laboratorio de Altas Energias
> > > > > Universidad Autonoma de Madrid. Phone: 34 91 497 3976
> > > > > Cantoblanco, 28049 Madrid, Spain. Fax: 34 91 497 3936
> > > > >
> > ========================================================================
> > > > ------- End of Original Message -------
> > > >
> > > >
> > ------- End of Original Message -------
> >
> >
------- End of Original Message -------
|