Hi,
I recall it started around the time I configured NIS on my cluster. The
nis server runs on the computing element.
--
Piotr Siwczak <[log in to unmask]>
System Administrator
Poznan Supercomputing and Networking Center
Supercomputing Department
(www.eu-egee.org <[log in to unmask]>)
--
On Tue, 27 Sep 2005 [log in to unmask] wrote:
> On Tue, 27 Sep 2005, Piotr Siwczak wrote:
>
>> Hi,
>>
>> Recently my site has been experiencing a strange error. Grid jobs are not
>
> What did you change just before it stopped working?
>
>> processed by torque, which rejects to queue them with the following error:
>>
>> req_reject;Reject reply code=15036(Job exceeds queue resource limits),
>> aux=0, type=QueueJob, from [log in to unmask]
>>
>> I've already reinstalled all the LCG rpms and totally regenerated the
>> torque config. I also removed the /opt/globus and /var/spool/pbs dirs
>> before reinstalling. None of these actions helped.
>>
>> The strange thing is that I can successfully submit jobs directly from
>> dteam001 account (and other pool accounts as well). The jobmanager fork
>> also works well. For me this seems like a jobmanager's issue, I don't know
>> how to tackle it though.
>
> The job managers are perl scripts that can be edited to get debug info.
> In particular, in /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm
> before this line:
>
> chomp($batch_id = `$qsub < $pbs_job_script_name $errfile`);
>
> insert something like this:
>
> system("cp $pbs_job_script_name /tmp/my_job_script.$$");
>
> Then look into such a script to see what extra requirements are specified
> that would cause the job to fail immediately.
>
|