Print

Print


Hi Anja,

If I understand correctly, you have setup the CE as a 'gateway' to an
existing PBS cluster. If this is true, then the job may be failing at
the authentication stage. 

Try using globus-job-run instead of edg-job-submit (you get more
information about the failure that way) and see if it's an
authentication problem. I suspect that your job is dying because the CA
certificate bundles are not available on your worker nodes so when they
try to retrieve the input sandbox (from the RB) using gsiftp, they can't
authenticate and terminate.

I remember seeing a page about this somewhere on (I think in the
troubleshooting section):
http://goc.grid.sinica.edu.tw/gocwiki/

Cheers, Marco.
 

On Wed, 2005-05-04 at 16:22 +0100, Anja Vest wrote:
> Hello,
> 
> we have a problem with job submission to our LCG site when we use our
> cluster control machine as PBS server.
> We are using LCG 2.4.0 with JOB_MANAGER=lcgpbs
> The jobs come in but fail immediately after they started. (I can see them
> for a few seconds with qstat.)
> Also the ~/.lcgjm/globus-cache-export.xxxx directory disappears directly.
> The ssh connection from the WN's to the CE is fine.
> Upon the advice of our susy-admin, we  tried to configure
> /opt/globus/setup/globus/lcgpbs.in such that the line
> #    $pbs_job_script->print("#PBS -W
> stagein=".$gpg_file."@".$my_hostname.":".$cache_export_dir."/".$gpg_file."\n");
> was commented out and I added
> $pbs_job_script->print("cp ".$cache_export_dir."/".$gpg_file." ~/\n");
> This didn't help.
> 
> Trying to submit the same same job with the CE as PBS server and a test node
> connected works fine.
> 
> Can anybody help us?
> 
> best regards,
> Anja