Hi, I recall it started around the time I configured NIS on my cluster. The nis server runs on the computing element. -- Piotr Siwczak <[log in to unmask]> System Administrator Poznan Supercomputing and Networking Center Supercomputing Department (www.eu-egee.org <[log in to unmask]>) -- On Tue, 27 Sep 2005 [log in to unmask] wrote: > On Tue, 27 Sep 2005, Piotr Siwczak wrote: > >> Hi, >> >> Recently my site has been experiencing a strange error. Grid jobs are not > > What did you change just before it stopped working? > >> processed by torque, which rejects to queue them with the following error: >> >> req_reject;Reject reply code=15036(Job exceeds queue resource limits), >> aux=0, type=QueueJob, from [log in to unmask] >> >> I've already reinstalled all the LCG rpms and totally regenerated the >> torque config. I also removed the /opt/globus and /var/spool/pbs dirs >> before reinstalling. None of these actions helped. >> >> The strange thing is that I can successfully submit jobs directly from >> dteam001 account (and other pool accounts as well). The jobmanager fork >> also works well. For me this seems like a jobmanager's issue, I don't know >> how to tackle it though. > > The job managers are perl scripts that can be edited to get debug info. > In particular, in /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm > before this line: > > chomp($batch_id = `$qsub < $pbs_job_script_name $errfile`); > > insert something like this: > > system("cp $pbs_job_script_name /tmp/my_job_script.$$"); > > Then look into such a script to see what extra requirements are specified > that would cause the job to fail immediately. >