Ok, i made the correction of the globus port range env variable, and now it works: the job is correctly submitted and running.

thanks all for your help :)

by the way Christine, I installed glite3 on the CE (lcg-CE).  

cheers,
Jean

Le 22 nov. 07 à 15:36, LEROY Christine a écrit :

Are you installing glite3.1 ?

 

If so can you check you have these files :

/opt/globus/libexec/globus-script-initializer
  /opt/globus/libexec/globus-sh-tools-vars.sh 
   /opt/globus/lib/perl/Globus/Core/Paths.pm

 

 

 


De : LHC Computer Grid - Rollout [mailto:[log in to unmask]] De la part de Jean Salzemann
Envoyé : jeudi 22 novembre 2007 15:25
À : [log in to unmask]
Objet : Re: [LCG-ROLLOUT] Job failing on CE with unspecified gridmanager error

 

Hi Christine,

 

we have the correct jobmanager types in /opt/globus/etc/grid-services, and we made the installation of the CE with yaim using lcg-CE_torque package (with  lcgpbs jobmanager  and torque batch system) on sl309. The ssh is working properly between WNs and CE, i can submit a qsub job from the CE and get the output (scp works without password prompt). From what i see the jobs do not enter pbs queue, they seem stuck between the gatekeeper and pbs so to speak. So it does not seem to be a stagein/out problem.

 

 

Le 22 nov. 07 à 14:55, LEROY Christine a écrit :



Hello Jean,

 

What do you have in your directory :

/opt/globus/etc/grid-services/

 

(we have for exemple :

# ls /opt/globus/etc/grid-services/

jobmanager  jobmanager-fork  jobmanager-lcgpbs )



 

Do you use pbs or lcgpbs ?

Did you use yaim? Maybe you need to be carefull with the variables?

If you use lcgpbs, ssh is working properly between WNs and CE?


De : LHC Computer Grid - Rollout [mailto:[log in to unmask]] De la part de Jean Salzemann
Envoyé : jeudi 22 novembre 2007 14:35
À : [log in to unmask]

Objet : [LCG-ROLLOUT] Job failing on CE with unspecified gridmanager error

 

Dear all,

 

We've set up a site in Vietnam, and i've experienced some behaviors i've never seen when submitting jobs. The jobs are failing on the CE with a dreadful "Got a job held event, reason: Unspecified gridmanager error", but i can't figure out why. 

 

qsub submissions work, globus-job-run (/bin/hostname) seem to work with fork (im not sure as for lcgpbs because the call prompts back without any output), pbs acl seem correct.  However in /var/log/messages i have this, whenever the user is mapped on a local account and the job supposed to be sent to pbs : 

 

Nov 22 19:43:03 ce gridinfo: [10770-10924] Job 1195735288:lcgpbs:internal_2961450261:10714.1195735287 FAILED during submission to batch system lcgpbs

 

But i have absolutely no idea of the possible causes for this.  Any idea ?  :)

 

thanks in advance,

Jean