On 2 May 2012, at 17:12, emyr.james wrote:
> Hi,
>
> I modified /usr/bin/sge_submit.sh to make it dump the run scripts it generates into /tmp.
Was just drafting something to suggest exaclty that!
>
> The problem is that the run scripts contain this line...
>
> #$ -q dteam
>
> I.e. jobs are going to the dteam queue which doesn't exist. I've created a queue called grid.q in SGE for all the grid jobs and have the following in my site-info.def...
>
> QUEUES="grid.q"
> GRID_Q_GROUP_ENABLE="ops dteam atlas snoplus"
>
> I have no idea where it's getting dteam from....
> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl
The 'cream-sge-dteam' part means:
CREAM
use the SGE engine type
send it to the 'dteam' queue
Try
glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-grid.q test.jdl
instead
> the run script *should* have this line in it...
>
> #$ -q grid.q
>
> anyone able to shed light on this ?
>
> On 02/05/12 16:30, emyr.james wrote:
>> Hi Daniela,
>> I switched to using this jdl...
>>
>> executable="/bin/sleep";
>> arguments="1";
>>
>> ...and I also cleaned up /etc/passwd and /etc/group (all the grid related stuff was in there twice).
>>
>> I'm now getting this in the log...
>>
>> 02 May 2012 16:12:15,851 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:2411) - (Worker Thread 1) JOB CREAM173523236 STATUS CHANGED: PENDING => ABORTED [failureReason=BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM173523236)] [localUser=dteam154] [delegationId=65c7415e9d0b602cafef21342fb2bb404eafd9a9]
>> 02 May 2012 16:13:56,840 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager (JobSubmissionManager.java:131) - (TIMER) AcceptNewJobs by script = true
>>
>> I followed the link you sent..
>>
>> 1.1 and 1.2 are fine
>>
>> I'm not sure how to get a valid proxy in /tmp/user.proxy so I can't do this step.
>>
>> 1.4 and 1.5 seem fine.
>>
>> For 1.6, it works but I see the above log. When I get the status I see this...
>> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl
>> https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582
>> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-status https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582
>>
>> ****** JobID=[https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582]
>> Status = [ABORTED]
>> ExitCode = []
>> FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM246546582)]
>>
>>
>> ej59@feynman:~/svn_work/grid/tests$
>>
>> So it's failing on job submission. Presumably it's having issues getting the SGE qsub command to work. I've managed to submit jobs to SGE from the box manually by su'ing to a grid user account and running qsub and that worked fine.
>>
>> Are there any logs or extra debugging I can enable to get more info on why it's not submitting ?
>>
>> Emyr
>>
>> On 02/05/12 15:26, Daniela Bauer wrote:
>>> Hi Emyr,
>>>
>>> Usually yaim makes the necessary updates to your /etc/sudoers file
>>> (there should also be an /etc/sudoers.forcream which is included in
>>> the standard one), but maybe something went wrong and/or you update
>>> your sudoers file otherwise and so removed the changes yaim made ?
>>>
>>> I recommend this page:
>>> https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide
>>>
>>> Cheers,
>>> Daniela
>>>
>>>
>>>
>>>
>>>
>>
>
|