JISCMail - LCG-ROLLOUT Archives

Hi,

I recall it started around the time I configured NIS on my cluster. The 
nis server runs on the computing element.


  --
  Piotr Siwczak <[log in to unmask]>
  System Administrator

  Poznan Supercomputing and Networking Center
  Supercomputing Department

  (www.eu-egee.org <[log in to unmask]>)
  --

On Tue, 27 Sep 2005 [log in to unmask] wrote:

> On Tue, 27 Sep 2005, Piotr Siwczak wrote:
>
>> Hi,
>>
>> Recently my site has been experiencing a strange error. Grid jobs are not
>
> What did you change just before it stopped working?
>
>> processed by torque, which rejects to queue them with the following error:
>>
>> req_reject;Reject reply code=15036(Job exceeds queue resource limits),
>> aux=0, type=QueueJob, from [log in to unmask]
>>
>> I've already reinstalled all the LCG rpms and totally regenerated the
>> torque config. I also removed the /opt/globus and /var/spool/pbs dirs
>> before reinstalling. None of these actions helped.
>>
>> The strange thing is that I can successfully submit jobs directly from
>> dteam001 account (and other pool accounts as well). The jobmanager fork
>> also works well. For me this seems like a jobmanager's issue, I don't know
>> how to tackle it though.
>
> The job managers are perl scripts that can be edited to get debug info.
> In particular, in /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm
> before this line:
>
>        chomp($batch_id = `$qsub < $pbs_job_script_name $errfile`);
>
> insert something like this:
>
> 	system("cp $pbs_job_script_name /tmp/my_job_script.$$");
>
> Then look into such a script to see what extra requirements are specified
> that would cause the job to fail immediately.
>