Hi Steve,

Unfortunally jobs are never ever queued...the jobmanager fails when trying to pass them to torque/maui (i guess maui tells no resource is available). If I submit jobs stating a queue, it runs normally:

test 1 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs /bin/hostname                        FAILED
test2 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs -q short /bin/hostname        SUCCESS
test3 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs -q esr /bin/hostname                SUCCESS

Enabling a higher log level for maui shows some kind of errors, but it seems (mauiusers rollout) normal:

04/20 12:23:18 MSUAcceptClient(6,ClientSD,HostName,TCP)
04/20 12:23:18 INFO:     accept call failed, errno: 11 (Resource temporarily unavailable)
04/20 12:23:18 INFO:     all clients connected.  servicing requests
(repeated several times)

I've also reconfigured CE (BTW, I use yaim tool), but nothing changed...

Any clue?
Vega



Steve Traylen <[log in to unmask]>
Sent by: LHC Computer Grid - Rollout <[log in to unmask]>

20/04/2005 10:22
Please respond to LHC Computer Grid - Rollout

       
        To:        [log in to unmask]
        cc:        
        Subject:        Re: [LCG-ROLLOUT] LCG2_4_0 problem with torque



On Tue, Apr 19, 2005 at 06:42:17PM +0200 or thereabouts, Vega Forneris wrote:
> Hi *
>
> i'm upgrading ESA-ESRIN node to LCG2_4_0, but now we have a problem with
> torque-maui : simply our CE doesn't pass any info to the batch system...

Hi Vega.

Do you mean news jobs are queued and never run or do they run but are not
reported.

If the jobs are just queued then try a

checkjob <jobid>

to see why they are not running.

Steve
>
> If I check the maui log:
>
> 04/19 18:39:55 INFO:     starting Maui version 3.2.6p11 ##################
> 04/19 18:39:55 MAMSetDefaults()
> 04/19 18:39:55 ServerProcessArgs(1,ArgV,0)
> 04/19 18:39:55 starting 3.2.6p11 version Maui (PID: 7346) on Tue Apr 19
> 18:39:55
> 04/19 18:39:55 MPBSInitialize(base,SC)
> 04/19 18:39:56 __MPBSSystemQuery(base,RCount,SC)
> 04/19 18:39:56 INFO:     connected to PBS server :0 on sd 1
> 04/19 18:39:56 MPBSClusterQuery(base,RCount,SC)
> 04/19 18:39:56 __MPBSGetNodeState(Name,State,PNode)
> 04/19 18:39:56 INFO:     PBS node wnode0101.esrin.esa.int set to state
> Idle (free)
> 04/19 18:39:56 INFO:     node slot not set on node
> 'wnode0101.esrin.esa.int'
> .............
> 04/19 18:39:56 __MPBSGetNodeState(Name,State,PNode)
> 04/19 18:39:56 INFO:     PBS node wnode0106.esrin.esa.int set to state
> Idle (free)
> 04/19 18:39:56 INFO:     node slot not set on node
> 'wnode0106.esrin.esa.int'
> 04/19 18:39:56 MPBSLoadQueueInfo(base,NULL,SC)
> 04/19 18:39:56 INFO:     0 PBS resources detected on RM base
> 04/19 18:39:56 WARNING:  no resources detected
> 04/19 18:39:56 MPBSWorkloadQuery(base,JCount,SC)
> 04/19 18:39:56 INFO:     0 PBS jobs detected on RM base
> 04/19 18:39:56 WARNING:  no workload detected
> 04/19 18:39:56 INFO:     current util[0]:  0/6 (0.00%)  PH: 0.35%  active
> jobs: 0 of 0 (completed: 601)
> 04/19 18:39:56 INFO:     scheduling complete.  sleeping 10 seconds
>
>
>
> > pbsnodes -a
> wnode0101.esrin.esa.int
>      state = free
>      np = 1
>      properties = lcgpro
>      ntype = cluster
>      status = arch=linux,uname=Linux wnode0101.esrin.esa.int
> 2.4.21-27.0.2.EL.XFS #1 Thu Jan 20 00:27:04 CET 2005 i686,sessions=?
> 0,nsessions=?
> 0,nusers=0,idletime=1703,totmem=1304260kb,availmem=1063348kb,physmem=252012kb,ncpus=1,loadave=0.00,rectime=1113928212
>
> wnode0102.esrin.esa.int
>      state = free
>         ....
>
> ...any clue?
>
> thanks
>
> Vega Forneris
>
> +-----------------------------------------------+
> ESA-ESRIN
> Unix Systems Administrator
> Via Galileo Galilei
> 00044 Frascati (Rm) - Italy
> Phone +39 06 94180581
> Mailto: [log in to unmask]
> +-----------------------------------------------+
> Vitrociset S.p.A.
> Unix System Administrator
> Via Tiburtina 1020
> 00100 Roma - Italy
> Phone +39 06 8820 4297
> Mailto: [log in to unmask]
> +-----------------------------------------------+
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/