Print

Print


On 2 May 2012, at 17:12, emyr.james wrote:

> Hi, 
> 
> I modified /usr/bin/sge_submit.sh to make it dump the run scripts it generates into /tmp. 

Was just drafting something to suggest exaclty that!

> 
> The problem is that the run scripts contain this line... 
> 
> #$ -q dteam 
> 
> I.e. jobs are going to the dteam queue which doesn't exist. I've created a queue called grid.q in SGE for all the grid jobs and have the following in my site-info.def... 
> 
> QUEUES="grid.q"
> GRID_Q_GROUP_ENABLE="ops dteam atlas snoplus" 
> 
> I have no idea where it's getting dteam from.... 

> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl 

The 'cream-sge-dteam' part means:
CREAM
use the SGE engine type
send it to the 'dteam' queue

Try 

glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-grid.q test.jdl 

instead


> the run script *should* have this line in it... 
> 
> #$ -q grid.q 
> 
> anyone able to shed light on this ? 
> 
> On 02/05/12 16:30, emyr.james wrote:
>> Hi Daniela, 
>> I switched to using this jdl... 
>> 
>> executable="/bin/sleep"; 
>> arguments="1"; 
>> 
>> ...and I also cleaned up /etc/passwd and /etc/group (all the grid related stuff was in there twice). 
>> 
>> I'm now getting this in the log... 
>> 
>> 02 May 2012 16:12:15,851 INFO  org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:2411) - (Worker Thread 1) JOB CREAM173523236 STATUS CHANGED: PENDING => ABORTED [failureReason=BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM173523236)] [localUser=dteam154] [delegationId=65c7415e9d0b602cafef21342fb2bb404eafd9a9] 
>> 02 May 2012 16:13:56,840 INFO  org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager (JobSubmissionManager.java:131) - (TIMER) AcceptNewJobs by script = true 
>> 
>> I followed the link you sent.. 
>> 
>> 1.1 and 1.2 are fine 
>> 
>> I'm not sure how to get a valid proxy in /tmp/user.proxy so I can't do this step. 
>> 
>> 1.4 and 1.5 seem fine. 
>> 
>> For 1.6, it works but I see the above log. When I get the status I see this... 
>> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl 
>> https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582 
>> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-status https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582 
>> 
>> ******  JobID=[https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582] 
>>     Status        = [ABORTED] 
>>     ExitCode      = [] 
>>     FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM246546582)] 
>> 
>> 
>> ej59@feynman:~/svn_work/grid/tests$ 
>> 
>> So it's failing on job submission. Presumably it's having issues getting the SGE qsub command to work. I've managed to submit jobs to SGE from the box manually by su'ing to a grid user account and running qsub and that worked fine. 
>> 
>> Are there any logs or extra debugging I can enable to get more info on why it's not submitting ? 
>> 
>> Emyr 
>> 
>> On 02/05/12 15:26, Daniela Bauer wrote: 
>>> Hi Emyr, 
>>> 
>>> Usually yaim makes the necessary updates to your /etc/sudoers file 
>>> (there should also be an /etc/sudoers.forcream which is included in 
>>> the standard one), but maybe something went wrong and/or you update 
>>> your sudoers file otherwise and so removed the changes yaim made ? 
>>> 
>>> I recommend this page: 
>>> https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide 
>>> 
>>> Cheers, 
>>> Daniela 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>