Print

Print


Hi *,

i found the problem: the "config_torque_server" function set the default 
queue as dteam and as i'm not a dteam users....now it is set to "short" 
and it works.

Cheers
Vega





Vega Forneris
20/04/2005 12:30


        To:     LHC Computer Grid - Rollout <[log in to unmask]>
        cc: 
        Subject:        Re: [LCG-ROLLOUT] LCG2_4_0 problem with torque

Hi Steve,

Unfortunally jobs are never ever queued...the jobmanager fails when trying 
to pass them to torque/maui (i guess maui tells no resource is available). 
If I submit jobs stating a queue, it runs normally: 

test 1 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs 
/bin/hostname                   FAILED
test2 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs -q short /bin/hostname   SUCCESS
test3 >> globus-job-run pcgrid01.esrin.esa.int:/jobmanager-lcgpbs -q esr /bin/hostname           SUCCESS

Enabling a higher log level for maui shows some kind of errors, but it 
seems (mauiusers rollout) normal:

04/20 12:23:18 MSUAcceptClient(6,ClientSD,HostName,TCP)
04/20 12:23:18 INFO:     accept call failed, errno: 11 (Resource 
temporarily unavailable)
04/20 12:23:18 INFO:     all clients connected.  servicing requests
(repeated several times)

I've also reconfigured CE (BTW, I use yaim tool), but nothing changed...

Any clue?
Vega





Steve Traylen <[log in to unmask]>
Sent by: LHC Computer Grid - Rollout <[log in to unmask]>
20/04/2005 10:22
Please respond to LHC Computer Grid - Rollout

 
        To:     [log in to unmask]
        cc: 
        Subject:        Re: [LCG-ROLLOUT] LCG2_4_0 problem with torque


On Tue, Apr 19, 2005 at 06:42:17PM +0200 or thereabouts, Vega Forneris 
wrote:
> Hi *
> 
> i'm upgrading ESA-ESRIN node to LCG2_4_0, but now we have a problem with 

> torque-maui : simply our CE doesn't pass any info to the batch system...

Hi Vega.

Do you mean news jobs are queued and never run or do they run but are not
reported.

If the jobs are just queued then try a 

checkjob <jobid>

to see why they are not running.

 Steve
> 
> If I check the maui log:
> 
> 04/19 18:39:55 INFO:     starting Maui version 3.2.6p11 
##################
> 04/19 18:39:55 MAMSetDefaults()
> 04/19 18:39:55 ServerProcessArgs(1,ArgV,0)
> 04/19 18:39:55 starting 3.2.6p11 version Maui (PID: 7346) on Tue Apr 19 
> 18:39:55
> 04/19 18:39:55 MPBSInitialize(base,SC)
> 04/19 18:39:56 __MPBSSystemQuery(base,RCount,SC)
> 04/19 18:39:56 INFO:     connected to PBS server :0 on sd 1
> 04/19 18:39:56 MPBSClusterQuery(base,RCount,SC)
> 04/19 18:39:56 __MPBSGetNodeState(Name,State,PNode)
> 04/19 18:39:56 INFO:     PBS node wnode0101.esrin.esa.int set to state 
> Idle (free)
> 04/19 18:39:56 INFO:     node slot not set on node 
> 'wnode0101.esrin.esa.int'
> .............
> 04/19 18:39:56 __MPBSGetNodeState(Name,State,PNode)
> 04/19 18:39:56 INFO:     PBS node wnode0106.esrin.esa.int set to state 
> Idle (free)
> 04/19 18:39:56 INFO:     node slot not set on node 
> 'wnode0106.esrin.esa.int'
> 04/19 18:39:56 MPBSLoadQueueInfo(base,NULL,SC)
> 04/19 18:39:56 INFO:     0 PBS resources detected on RM base
> 04/19 18:39:56 WARNING:  no resources detected
> 04/19 18:39:56 MPBSWorkloadQuery(base,JCount,SC)
> 04/19 18:39:56 INFO:     0 PBS jobs detected on RM base
> 04/19 18:39:56 WARNING:  no workload detected
> 04/19 18:39:56 INFO:     current util[0]:  0/6 (0.00%)  PH: 0.35% active 

> jobs: 0 of 0 (completed: 601)
> 04/19 18:39:56 INFO:     scheduling complete.  sleeping 10 seconds
> 
> 
> 
> > pbsnodes -a
> wnode0101.esrin.esa.int
>      state = free
>      np = 1
>      properties = lcgpro
>      ntype = cluster
>      status = arch=linux,uname=Linux wnode0101.esrin.esa.int 
> 2.4.21-27.0.2.EL.XFS #1 Thu Jan 20 00:27:04 CET 2005 i686,sessions=? 
> 0,nsessions=? 
> 
0,nusers=0,idletime=1703,totmem=1304260kb,availmem=1063348kb,physmem=252012kb,ncpus=1,loadave=0.00,rectime=1113928212
> 
> wnode0102.esrin.esa.int
>      state = free
>         ....
> 
> ...any clue?
> 
> thanks
> 
> Vega Forneris 
> 
> +-----------------------------------------------+
> ESA-ESRIN 
> Unix Systems Administrator
> Via Galileo Galilei
> 00044 Frascati (Rm) - Italy
> Phone +39 06 94180581
> Mailto: [log in to unmask]
> +-----------------------------------------------+
> Vitrociset S.p.A.
> Unix System Administrator
> Via Tiburtina 1020 
> 00100 Roma - Italy 
> Phone +39 06 8820 4297 
> Mailto: [log in to unmask]
> +-----------------------------------------------+
-- 
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/