On Tue, 6 Sep 2005, Johan Gunnarsson wrote:
> Maarten Litmaath wrote:
> > Johan Gunnarsson wrote:
> >
> >> I am having problems getting LCG to submit jobs to pbs. The CE and
> >> pbs-server are running on different hosts (WN:s are running the TAR_WN
> >> installation).
> >>
> >> When submitting an lcg-job i get the following in the pbs-server log:
> >>
> >> 09/06/2005 14:22:03;0008;PBS_Server;Job;111464.smokescreen;Job Queued at
> >> request of dteam002@n100, owner = dteam002@n100, job name = STDIN, queue
> >> = gridjobs
> >> 09/06/2005 14:22:05;0008;PBS_Server;Job;111464.smokescreen;MOM rejected
> >> modify request, error: 15001
> >> 09/06/2005 14:22:05;0080;PBS_Server;Req;req_reject;Reject reply
> >> code=15001, aux=0, type=11, from root@smokescreen
> >>
> >> In the mom-logs on the WN I get:
> >>
> >> pbs_mom;Req;del_files;cannot stat globus-cache-export.sR2688.gpg
> >> pbs_mom;Req;;Type deletejob request received from
> >> PBS_Server@smokescreen, sock=11
> >> pbs_mom;Req;;Type deletefiles request received from
> >> PBS_Server@smokescreen, sock=10
> >> pbs_mom;Req;;Type modifyjob request received from
> >> PBS_Server@smokescreen, sock=11
> >> pbs_mom;Req;req_reject;Reject reply code=15001, aux=0, type=11, from
> >> PBS_Server@smokescreen
> >>
> >> Error 15001 is 'Unknown Job Identifier'.
> >>
> >> What might be wrong here?
> >
> >
> > Check the first "$clienthost" line in /var/spool/pbs/mom_priv/config on
> > the WN:
> > does it name your (fully-qualified) PBS server host?
>
> It names the internal name of the PBS server host (smokescreen). However
> I'm able to do qsub from the CE (and nordugrid is able to submit jobs to
> pbs), so I'm not sure thats the problem.
Those PBS messages confuse the matter. Rather send us the error that your
grid job got. For example:
-----------------------------------------------------------------------------
$ globus-job-run n100.bluesmoke.nsc.liu.se:2119/jobmanager-lcgpbs \
-q gridjobs /bin/hostname
submit-helper script running on host n98 gave error: cache_export_dir
(/home/dteamsgm/.lcgjm/globus-cache-export.R18844) on gatekeeper did not
contain a cache_export_dir.tar archive
-----------------------------------------------------------------------------
That error has its own Wiki entry:
http://goc.grid.sinica.edu.tw/gocwiki/submit-helper_script_%2e%2e%2e_gave_error%3a_cache_export_dir_%2e%2e%2e
Does that explain the problem?
|