JISCMail - LCG-ROLLOUT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
LCG-ROLLOUT Archives

LCG-ROLLOUT@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		LCG-ROLLOUT Home
		LCG-ROLLOUT 2005
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: torque + maui
From:
Antun Balaz <[log in to unmask]>
Reply-To:
LHC Computer Grid - Rollout <[log in to unmask]>
Date:
Tue, 29 Nov 2005 14:50:48 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (319 lines)
Hi Juan,
have you noticed this:

> I can see:
> # checking job 423
> 
> State: Idle
> Creds:  user:dteam27  group:dteam  account:/C=ES/O=DATAGRID-
> ES/O=UAM/CN=Juan Jose Pardo Navarro DTEAM/CN=p
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For some CMS job I have

State: Running
Creds:  user:cms001  group:cms  account:all  class:cms  qos:DEFAULT
WallTime: 1:04:08:38 of 20:20:00:00
SubmitTime: Mon Nov 28 10:38:53
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

Can you check again you maui.cfg, and restart maui and pbs_server...

Regards, Antun



> class:dteam qos:DEFAULT
> WallTime: 00:00:00 of 4:03:00:00
> SubmitTime: Mon Nov 28 17:19:23
>   (Time Queued  Total: 21:11:06  Eligible: 00:00:00)
> 
> Total Tasks: 1
> 
> Req[0]  TaskCount: 1  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> 
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Holds:    Batch
> Messages:  exceeds available partition procs
> PE:  1.00  StartPriority:  1
> cannot select job 423 for partition DEFAULT (job hold active)
> 
> # showq
> 
> 423     dteam27  BatchHold     1  4:03:00:00  Mon Nov 28 17:19:23
> 
> El mar, 29-11-2005 a las 13:53 +0100, Antun Balaz escribió:
> > Hi Juan,
> > it seems that maui is not able to detect your nodes... What is the 
output 
> > of 'pbsnodes -a'? Is torque server on the same machine as CE? Any other 
> > ideas?
> > 
> > Regards, Antun
> > 
> > -----
> > E-mail: [log in to unmask]
> > Web: http://scl.phy.bg.ac.yu/
> > 
> > Phone: +381 11 3160260, Ext. 152
> > Fax: +381 11 3162190
> > 
> > Scientific Computing Laboratory
> > Institute of Physics, Belgrade
> > Serbia and Montenegro
> > -----
> > 
> > ---------- Original Message -----------
> > From: Juan Jose Pardo Navarro <[log in to unmask]>
> > To: [log in to unmask]
> > Sent: Mon, 28 Nov 2005 15:01:38 +0100
> > Subject: Re: [LCG-ROLLOUT] torque + maui
> > 
> > > Hi  Antun,
> > > 
> > > thanks for your answer. It is a good idea.
> > > 
> > > I see (maui.log) that maui can see the free machines but :
> > > 
> > > 11/26 12:30:42 INFO:     0 PBS resources detected on RM base
> > > 11/26 12:30:42 WARNING:  no resources detected
> > > 11/26 12:30:42 MPBSWorkloadQuery(base,JCount,SC)
> > > 11/26 12:30:42 INFO:     28 PBS jobs detected on RM base
> > > 11/26 12:30:42 INFO:     jobs detected: 28
> > > 11/26 12:30:42 INFO:     total jobs selected (ALL): 0/28 [Hold: 28]
> > > 11/26 12:30:42 INFO:     total jobs selected (ALL): 0/28 [Hold: 28]
> > > 
> > > more details:
> > > 
> > > .................................................
> > > ...............................................
> > > 11/26 12:30:42 INFO:     PBS node node1.ft.uam.es set to state Idle
> > > (free)
> > > 11/26 12:30:42 MPBSLoadQueueInfo(base,node2.ft.uam.es,SC)
> > > 11/26 12:30:42 __MPBSGetNodeState(Name,State,PNode)
> > > 11/26 12:30:42 INFO:     PBS node node3.ft.uam.es set to state Idle
> > > (free)
> > > 11/26 12:30:42 MPBSLoadQueueInfo(base,node3.ft.uam.es,SC)
> > > 11/26 12:30:42 __MPBSGetNodeState(Name,State,PNode)
> > > ...............................................
> > > .................................................
> > > 
> > > 11/26 12:30:42 INFO:     0 PBS resources detected on RM base
> > > 11/26 12:30:42 WARNING:  no resources detected
> > > 11/26 12:30:42 MPBSWorkloadQuery(base,JCount,SC)
> > > 11/26 12:30:42 INFO:     28 PBS jobs detected on RM base
> > > 11/26 12:30:42 INFO:     jobs detected: 28
> > > 11/26 12:30:42 INFO:     total jobs selected (ALL): 0/28 [Hold: 28]
> > > 11/26 12:30:42 INFO:     total jobs selected (ALL): 0/28 [Hold: 28]
> > > 
> > > thanks for all
> > > 
> > > El dom, 27-11-2005 a las 19:14 +0100, Antun Balaz escribió:
> > > > Hi Juan,
> > > > according to your maui.cfg, dteam should get 1/130 share of your 
> > resources. 
> > > > If you have 20 nodes, do the math...
> > > > 
> > > > I am not 100% sure that this causes the problem, but if it is, try 
to 
> > put 
> > > > all queues except for dteam in one group, say 'all', and dteam queue 
in 
> > a 
> > > > separate group, say 'reserved', adding ADEF=reserved and ADEF=all at 
the 
> > end 
> > > > of appropriate lines in maui.cfg:
> > > > 
> > > > GROUPCFG[dteam]   FSTARGET=100 MAXPROC=10,20       ADEF=reserved
> > > > GROUPCFG[atlas]   FSTARGET=30  MAXPROC=10,20       ADEF=all
> > > > GROUPCFG[gvmuam]  FSTARGET=30  MAXPROC=10 MAXJOB=1 ADEF=all
> > > > GROUPCFG[zeus]    FSTARGET=30  MAXPROC=10 MAXJOB=1 ADEF=all
> > > > GROUPCFG[short]   FSTARGET=30  MAXPROC=10 MAXJOB=1 ADEF=all
> > > > 
> > > > (I would also suggest you to remove GROUPCFG[DEFAULT] line if you do 
not 
> > use 
> > > > it).
> > > > 
> > > > After this, just add
> > > > 
> > > > ACCOUNTCFG[reserved] FSTARGET=5  MAXPROC=19
> > > > ACCOUNTCFG[all]      FSTARGET=95 MAXPROC=20
> > > > 
> > > > Now, the FSTARGETs will apply only within the groups, i.e. dteam 
will 
> > have 
> > > > 100% within the group 'reserved', which will get 5% of the overall 
> > resources 
> > > > (i.e. 1 node); real atlas FSTARGET will be 25% of 95% etc. You can 
as 
> > well 
> > > > adjust FSTARGETs for the rest of the queues to 25 (so that they sum 
up 
> > to 
> > > > 100), although it is not obligatory...
> > > > 
> > > > Hope this helps,
> > > > Antun
> > > > 
> > > > -----
> > > > E-mail: [log in to unmask]
> > > > Web: http://scl.phy.bg.ac.yu/
> > > > 
> > > > Phone: +381 11 3160260, Ext. 152
> > > > Fax: +381 11 3162190
> > > > 
> > > > Scientific Computing Laboratory
> > > > Institute of Physics, Belgrade
> > > > Serbia and Montenegro
> > > > -----
> > > > 
> > > > ---------- Original Message -----------
> > > > From: Juan Jose Pardo Navarro <[log in to unmask]>
> > > > To: [log in to unmask]
> > > > Sent: Sun, 27 Nov 2005 12:20:13 +0100
> > > > Subject: [LCG-ROLLOUT] torque + maui
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > I have torque and maui and when I submit some jobs, all jobs 
always are
> > > > > with queue status:
> > > > > 
> > > > > Some details:
> > > > > 
> > > > > A)
> > > > > 
> > > > > # qstat -q
> > > > > 
> > > > > Queue            Memory CPU Time Walltime Node Run Que Lm  State
> > > > > ---------------- ------ -------- -------- ---- --- --- --  -----
> > > > > dteam            --   02:00:00 99:00:00  --    0  28 40   E R
> > > > >                                                --- ---
> > > > >                                                 0  28
> > > > > 
> > > > > B)
> > > > > 
> > > > > the configuration of server pbs and dteam is:
> > > > > 
> > > > > create queue dteam
> > > > > set queue dteam queue_type = Execution
> > > > > set queue dteam max_running = 40
> > > > > set queue dteam resources_max.cput = 02:00:00
> > > > > set queue dteam resources_max.walltime = 99:00:00
> > > > > set queue dteam enabled = True
> > > > > set queue dteam started = True
> > > > > #	 Set server attributes.
> > > > > #
> > > > > set server scheduling = True
> > > > > set server acl_host_enable = False
> > > > > set server managers = root@serverpbs
> > > > > set server operators = root@serverpbs
> > > > > set server default_queue = dteam
> > > > > set server log_events = 511
> > > > > set server mail_from = adm
> > > > > set server query_other_jobs = True
> > > > > set server scheduler_iteration = 600
> > > > > set server node_ping_rate = 300
> > > > > set server node_check_rate = 600
> > > > > set server tcp_timeout = 6
> > > > > set server default_node = lcgpro
> > > > > set server node_pack = False
> > > > > set server job_stat_rate = 30
> > > > > 
> > > > > C)
> > > > > 
> > > > > I have 20 nodes:
> > > > > 
> > > > > *************************************
> > > > > 
> > > > > node1
> > > > >      state = free
> > > > >      np = 1
> > > > >      properties = lcgpro
> > > > >      ntype = cluster
> > > > >      status = arch=linux,uname=Linux node0 2.4.21-32.0.1.EL #1 W
> > > > > ed May 25 16:02:04 CDT 2005 i686,sessions=? 0,nsessions=?
> > > > > 0,nusers=0,idletime=26
> > > > > 881,totmem=993140kb,availmem=875488kb,physmem=479068kb,ncpus=1,
> > > > > loadave=0.00,netl oad=120720384,state=free,rectime=1132633775
> > > > > 
> > > > > node1
> > > > >      state = free
> > > > >      np = 1
> > > > >      properties = lcgpro
> > > > >      ntype = cluster
> > > > >      status = arch=linux,uname=Linux node1 2.4.21-32.0.1.EL #1 W
> > > > > ed May 25 16:02:04 CDT 2005 i686,sessions=? 0,nsessions=?
> > > > > 0,nusers=0,idletime=26
> > > > > 881,totmem=479068kb,availmem=361576kb,physmem=479068kb,ncpus=1,
> > > > > loadave=0.00,netl 
> > > > 
> > 
oad=120698957,state=free,rectime=1132633785 .................................
.......................
> > .......................
> > > > .....................
> > > > > 
> > > > > D) I send /var/spool/maui/maui.cfg file
> > > > > 
> > > > > E)
> > > > > 
> > > > > #qstat -a
> > > > > 
> > > > > [root@gridce01 root]# qstat -a
> > > > > 
> > > > > gridce01.ft.uam.es:
> > > > >                                                             Req'd  
> > Req'd
> > > > > Elap
> > > > > Job ID          Username Queue    Jobname    SessID NDS TSK Memory 
Time
> > > > > S Time
> > > > > --------------- -------- -------- ---------- ------ --- --- ------
 ----
> > -
> > > > > - -----
> > > > > 283.serverpbs dteam27  dteam    STDIN         --    1  --    --  
> > > > > 02:00 Q
> > > > > --
> > > > > 284.serverpbs dteam27  dteam    STDIN         --    1  --    --  
> > > > > 02:00 Q
> > > > > --
> > > > > 285.serverpbs dteam27  dteam    STDIN         --    1  --    --  
> > > > > 02:00 Q
> > > > > --
> > > > > 286.serverpbs dteam27  dteam    STDIN         --    1  --    --  
> > > > > 02:00 Q
> > > > > --
> > > > > 287.serverpbs dteam27  dteam    STDIN         --    1  --    --  
> > > > > 02:00 Q
> > > > > --
> > > > > .............................................................
> > > > > ...........................................................
> > > > > 
> > > > > any idea ?
> > > > > 
> > > > > -- 
> > > > > 
> > ========================================================================
> > > > > Juan Jose Pardo Navarro              e-mail: [log in to unmask]
> > > > > Dpto Fisica Teorica. C-XI.
> > > > > Laboratorio de Altas Energias
> > > > > Universidad Autonoma de Madrid.    Phone: 34 91 497 3976
> > > > > Cantoblanco, 28049 Madrid, Spain.  Fax: 34 91 497 3936
> > > > > 
> > ========================================================================
> > > > ------- End of Original Message -------
> > > > 
> > > >
> > ------- End of Original Message -------
> > 
> >
------- End of Original Message -------
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options