On Tue, 27 Sep 2005, Piotr Siwczak wrote:
> Hi,
>
> Recently my site has been experiencing a strange error. Grid jobs are not
What did you change just before it stopped working?
> processed by torque, which rejects to queue them with the following error:
>
> req_reject;Reject reply code=15036(Job exceeds queue resource limits),
> aux=0, type=QueueJob, from [log in to unmask]
>
> I've already reinstalled all the LCG rpms and totally regenerated the
> torque config. I also removed the /opt/globus and /var/spool/pbs dirs
> before reinstalling. None of these actions helped.
>
> The strange thing is that I can successfully submit jobs directly from
> dteam001 account (and other pool accounts as well). The jobmanager fork
> also works well. For me this seems like a jobmanager's issue, I don't know
> how to tackle it though.
The job managers are perl scripts that can be edited to get debug info.
In particular, in /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm
before this line:
chomp($batch_id = `$qsub < $pbs_job_script_name $errfile`);
insert something like this:
system("cp $pbs_job_script_name /tmp/my_job_script.$$");
Then look into such a script to see what extra requirements are specified
that would cause the job to fail immediately.
|