Maarten Litmaath, CERN wrote:
> On Tue, 6 Sep 2005, Johan Gunnarsson wrote:
>
>
>>Maarten Litmaath wrote:
>>
>>>Johan Gunnarsson wrote:
>>>
>>>
>>>>I am having problems getting LCG to submit jobs to pbs. The CE and
>>>>pbs-server are running on different hosts (WN:s are running the TAR_WN
>>>>installation).
>>>>
>>>>When submitting an lcg-job i get the following in the pbs-server log:
>>>>
>>>>09/06/2005 14:22:03;0008;PBS_Server;Job;111464.smokescreen;Job Queued at
>>>>request of dteam002@n100, owner = dteam002@n100, job name = STDIN, queue
>>>>= gridjobs
>>>>09/06/2005 14:22:05;0008;PBS_Server;Job;111464.smokescreen;MOM rejected
>>>>modify request, error: 15001
>>>>09/06/2005 14:22:05;0080;PBS_Server;Req;req_reject;Reject reply
>>>>code=15001, aux=0, type=11, from root@smokescreen
>>>>
>>>>In the mom-logs on the WN I get:
>>>>
>>>>pbs_mom;Req;del_files;cannot stat globus-cache-export.sR2688.gpg
>>>>pbs_mom;Req;;Type deletejob request received from
>>>>PBS_Server@smokescreen, sock=11
>>>>pbs_mom;Req;;Type deletefiles request received from
>>>>PBS_Server@smokescreen, sock=10
>>>>pbs_mom;Req;;Type modifyjob request received from
>>>>PBS_Server@smokescreen, sock=11
>>>>pbs_mom;Req;req_reject;Reject reply code=15001, aux=0, type=11, from
>>>>PBS_Server@smokescreen
>>>>
>>>>Error 15001 is 'Unknown Job Identifier'.
>>>>
>>>>What might be wrong here?
>>>
>>>
>>>Check the first "$clienthost" line in /var/spool/pbs/mom_priv/config on
>>>the WN:
>>>does it name your (fully-qualified) PBS server host?
>>
>>It names the internal name of the PBS server host (smokescreen). However
>>I'm able to do qsub from the CE (and nordugrid is able to submit jobs to
>>pbs), so I'm not sure thats the problem.
>
>
> Those PBS messages confuse the matter. Rather send us the error that your
> grid job got. For example:
>
> -----------------------------------------------------------------------------
> $ globus-job-run n100.bluesmoke.nsc.liu.se:2119/jobmanager-lcgpbs \
> -q gridjobs /bin/hostname
> submit-helper script running on host n98 gave error: cache_export_dir
> (/home/dteamsgm/.lcgjm/globus-cache-export.R18844) on gatekeeper did not
> contain a cache_export_dir.tar archive
> -----------------------------------------------------------------------------
>
> That error has its own Wiki entry:
>
> http://goc.grid.sinica.edu.tw/gocwiki/submit-helper_script_%2e%2e%2e_gave_error%3a_cache_export_dir_%2e%2e%2e
>
> Does that explain the problem?
Yes. The problem was that the globus binaries in the TAR distribution
need glibc2.3, the WN:s are running on rh7.3 (which means glibc2.2) (I
should have figured this out in the beginning).
Are there any plans for a TAR distribution that works in redhat7.3?
--
--------------------------------------------------------
Johan Gunnarsson Systems expert
National Supercomputer Centre Linköping university
[log in to unmask] http://www.nsc.liu.se
--------------------------------------------------------
|