On Thu, Jul 9, 2009 at 4:43 PM, Dmitry Ozerov<[log in to unmask]> wrote:
> Hi,
>
> i'm out of ideas and need help on this topic:
>
> i have problem with the PPS-glite-WN glite 3.2 installation.
> The batch and lcg-CE are installed on two different machines.
> The problem is traced to the communication between the batch server
> and the node. Being the grid user on the server and making qsub -I -q
> <queue>, i see the job with the qstat in queued mode, than after few
> minutes the node to which the job is sent become "down" and on the node:
> /etc/init.d/pbs_mom status
> pbs_mom dead but subsys locked
> (without jobs the pbs_mom is "running").
Hi
It seems
2.3.0 and 2.3.6 are incompatible as they stand in the PPS.
https://savannah.cern.ch/bugs/?51261
presumably this will be resolved before release :-)
Steve
>
> The message in /var/spool/pbs/server_logs on server side is :
> 07/09/2009 16:28:27;0008;PBS_Server;Job;658.tb021.desy.de;send of job to
> tb019.desy.de failed error = 15031
> 07/09/2009 16:28:27;0001;PBS_Server;Svr;PBS_Server;Batch protocol error
> (15031) in send_job, child failed in previous commit request for job
> 658.tb021.desy.d
> e
> 07/09/2009 16:28:27;0008;PBS_Server;Job;658.tb021.desy.de;unable to run
> job, MOM rejected/rc=1
> 07/09/2009 16:28:27;0080;PBS_Server;Req;req_reject;Reject reply
> code=15041(Execution server rejected request MSG=cannot send job to mom,
> state=PRERUN), aux=0
> , type=RunJob, from [log in to unmask]
>
> When i connect to this server pps wn from the glite 3.1 release -
> everything went fine.
>
> Details:
> server:
> Scientific Linux SL release 4.7 (Beryllium)
> PPS-glite-TORQUE_server-3.1.9-0
> PPS-glite-TORQUE_utils-3.1.12-0
> torque-devel-2.3.6-1cri.slc4
> torque-drmaa-2.3.6-1cri.slc4
> glite-yaim-torque-utils-4.0.3-1
> torque-client-2.3.6-1cri.slc4
> torque-server-2.3.6-1cri.slc4
> torque-drmaa-docs-2.3.6-1cri.slc4
> glite-yaim-torque-server-4.0.3-2
> torque-2.3.6-1cri.slc4
> torque-docs-2.3.6-1cri.slc4
> Linux tb021 2.6.18-128.1.6.el5xen #1 SMP Wed Apr 1 07:21:08 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
>
> 5.3 client:
> Scientific Linux SL release 5.3 (Boron)
> PPS-glite-WN-version-3.2.3-0
> PPS-glite-TORQUE_client-3.2.1-0
> torque-mom-2.3.0-snap.200801151629.2cri.sl5
> torque-client-2.3.0-snap.200801151629.2cri.sl5
> torque-2.3.0-snap.200801151629.2cri.sl5
> glite-yaim-torque-client-4.0.1-1
> Linux tb019.desy.de 2.6.18-128.1.14.el5 #1 SMP Tue Jun 16 18:47:37 EDT
> 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> 4.7 client:
> Scientific Linux SL release 4.7 (Beryllium)
> PPS-glite-TORQUE_client-3.1.8-0.i386
> PPS-glite-WN-3.1.35-0.i386
> torque-docs-2.3.6-1cri.slc4.i386
> torque-client-2.3.6-1cri.slc4.i386
> glite-yaim-torque-client-4.0.2-1.noarch
> torque-mom-2.3.6-1cri.slc4.i386
> torque-pam-2.3.6-1cri.slc4.i386
> torque-devel-2.3.6-1cri.slc4.i386
> torque-2.3.6-1cri.slc4.i386
> Linux tb018.desy.de 2.6.9-78.0.22.ELsmp #1 SMP Thu Apr 30 23:30:54 CDT
> 2009 i686 i686 i386 GNU/Linux
>
> Thanks for any help,
> Dima.
>
> P.S. (i can ssh from the client to server without password)
>
--
Steve Traylen
|