Hi Raul,

yes, it is munged with the same key and the right permissions.

cheers
alessandra

On 09/10/2012 20:34, Raul H C Lopes wrote:
[log in to unmask]" type="cite">
Hi ALessandra,

I suppose it uses munged.

  - is the munged service running? what did yaim say about it?

  - do CE and WNs share same munge key with adequate permissions?

thanks, raul 
On 09/10/12 19:17, Alessandra Forti wrote:
[log in to unmask]" type="cite"> Has anybody seen this problem by any chance?


-------- Original Message --------
Subject: Maui problem
Date: Tue, 9 Oct 2012 18:04:20 +0100
From: Alessandra Forti <[log in to unmask]>
To: <[log in to unmask]>


Hi,

I have installed a mini test cluster with torque and maui. We have used maui/torque for years on our grid cluster and now we are upgrading to torque 2.5.7 and maui 3.3-4. Unfortunately with this new combination maui doesn't seem to work correctly. When I submit jobs and it behaves as if there weren't any free resources. Even when I tried to install only torque and maui with a bare minimum configuration I got the same behaviour, i.e.

1) When I submit the jobs just remain queued

[root@<server> maui]# qstat -an1

<server>:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
10.<server>     aforti   long     pbs-vm3.sh          --    --   --    --    --  Q   --     --
11.s<server>    aforti   long     pbs-vm3.sh          --    --   --    --    --  Q   --     --


2) If I run qrun <jobid> the job runs so I assume the problem is not between torque server and torque mom.
3) When I use showq on the old versions displayed the WCLimit of the default queue now it displays 0 at first and then it changes it by itself to 100 days

[root@<server> maui]# showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME


     0 Active Jobs       0 of   16 Processors Active (0.00%)
                         0 of    1 Nodes Active      (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

10                   aforti       Idle     1 99:23:59:59  Tue Oct  9 15:32:13
11                   aforti       Idle     1 99:23:59:59  Tue Oct  9 16:39:09

2 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 2   Active Jobs: 0   Idle Jobs: 2   Blocked Jobs: 0

4) Checkjob <jobid> just tells me the job cannot be run in the default partition without any particular reason

[.....]
PE:  1.00  StartPriority:  120

cannot select job 10 for partition DEFAULT (Class)


5) Checknode can see the node free if it wasn't clear from other commands

[root@<server> maui]# !checkno
checknode <node>

checking node <node>

State:      Idle  (in current state for 00:55:10)
Configured Resources: PROCS: 16  MEM: 23G  SWAP: 31G  DISK: 1M
Utilized   Resources: SWAP: 202M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [lcgpro]
Attributes: [Batch]
Classes:    [DEFAULT 1:1]

Total Time: 3:06:35  Up: 3:06:24 (99.90%)  Active: 00:00:10 (0.09%)

Reservations:
NOTE:  no reservations on node


6) When I use showbf -v though it says my nodes are blocked by reservations despite checknode clearly telling me there are no reservations on that node. In our local maui.cfg there is a reservation for 1 proc I'm not sure why it blocks the whole node

[root@<server2> server_logs]# showbf -v
backfill window (user: 'root' group: 'root' partition: ALL) Tue Oct  9 17:08:59

  3 procs available with no timelimit

node <node2> is blocked by reservation sft.0.0 in   INFINITY

But to be sure I removed it and even when I remove the reservation and reduce the maui.cfg to the default version without anything in it it tells me the node is blocked by "reservation NONE in INFINITY"

[root@
<server> maui]# showbf -v
backfill window (user: 'root' group: 'root' partition: ALL) Tue Oct  9 17:37:58

 16 procs available with no timelimit

node <node> is blocked by reservation NONE in   INFINITY

I'm not sure how to proceed because the log files don't tell me anything and all the references I have found to a similar problem have remained unanswered.

Thanks for any help here are the rpms I used

maui-3.3-4.el5
maui-client-3.3-4.el5
maui-server-3.3-4.el5
torque-2.5.7-7.el5
torque-client-2.5.7-7.el5
torque-server-2.5.7-7.el5

libtorque-2.5.7-7.el5


the maui.cfg

#
# MAUI configuration example
# @(#)maui.cfg David Groep 20031015.1
# for MAUI version 3.2.5
#
SERVERHOST              <server>

ADMIN1                  root
ADMINHOST               <server>

RMTYPE[0]           PBS
RMHOST[0]           <server>

RMSERVER[0]         <server>

SERVERPORT            40559
SERVERMODE            NORMAL

# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL        00:00:10

# a max. 10 MByte log file in a logical location
LOGFILE               /var/log/maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3


and Torque config

create queue long
set queue long queue_type = Execution
set queue long acl_hosts = localhost
set queue long acl_hosts += <server>
set queue long resources_max.cput = 48:00:00
set queue long resources_max.walltime = 72:00:00
set queue long acl_group_enable = True
set queue long acl_groups = aforti
set queue long enabled = True
set queue long started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts = <server>
set server acl_hosts += localhost
set server default_queue = long
set server log_events = 511
set server mail_from = adm
set server next_job_number = 12

-- 
Facts aren't facts if they come from the wrong people. (Paul Krugman)






-- 
Facts aren't facts if they come from the wrong people. (Paul Krugman)