On 2 May 2012, at 17:12, emyr.james wrote: > Hi, > > I modified /usr/bin/sge_submit.sh to make it dump the run scripts it generates into /tmp. Was just drafting something to suggest exaclty that! > > The problem is that the run scripts contain this line... > > #$ -q dteam > > I.e. jobs are going to the dteam queue which doesn't exist. I've created a queue called grid.q in SGE for all the grid jobs and have the following in my site-info.def... > > QUEUES="grid.q" > GRID_Q_GROUP_ENABLE="ops dteam atlas snoplus" > > I have no idea where it's getting dteam from.... > ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl The 'cream-sge-dteam' part means: CREAM use the SGE engine type send it to the 'dteam' queue Try glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-grid.q test.jdl instead > the run script *should* have this line in it... > > #$ -q grid.q > > anyone able to shed light on this ? > > On 02/05/12 16:30, emyr.james wrote: >> Hi Daniela, >> I switched to using this jdl... >> >> executable="/bin/sleep"; >> arguments="1"; >> >> ...and I also cleaned up /etc/passwd and /etc/group (all the grid related stuff was in there twice). >> >> I'm now getting this in the log... >> >> 02 May 2012 16:12:15,851 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:2411) - (Worker Thread 1) JOB CREAM173523236 STATUS CHANGED: PENDING => ABORTED [failureReason=BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM173523236)] [localUser=dteam154] [delegationId=65c7415e9d0b602cafef21342fb2bb404eafd9a9] >> 02 May 2012 16:13:56,840 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager (JobSubmissionManager.java:131) - (TIMER) AcceptNewJobs by script = true >> >> I followed the link you sent.. >> >> 1.1 and 1.2 are fine >> >> I'm not sure how to get a valid proxy in /tmp/user.proxy so I can't do this step. >> >> 1.4 and 1.5 seem fine. >> >> For 1.6, it works but I see the above log. When I get the status I see this... >> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-submit -a -r grid-cream-01.hpc.susx.ac.uk:8443/cream-sge-dteam test.jdl >> https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582 >> ej59@feynman:~/svn_work/grid/tests$ glite-ce-job-status https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582 >> >> ****** JobID=[https://grid-cream-01.hpc.susx.ac.uk:8443/CREAM246546582] >> Status = [ABORTED] >> ExitCode = [] >> FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:) N/A (jobId = CREAM246546582)] >> >> >> ej59@feynman:~/svn_work/grid/tests$ >> >> So it's failing on job submission. Presumably it's having issues getting the SGE qsub command to work. I've managed to submit jobs to SGE from the box manually by su'ing to a grid user account and running qsub and that worked fine. >> >> Are there any logs or extra debugging I can enable to get more info on why it's not submitting ? >> >> Emyr >> >> On 02/05/12 15:26, Daniela Bauer wrote: >>> Hi Emyr, >>> >>> Usually yaim makes the necessary updates to your /etc/sudoers file >>> (there should also be an /etc/sudoers.forcream which is included in >>> the standard one), but maybe something went wrong and/or you update >>> your sudoers file otherwise and so removed the changes yaim made ? >>> >>> I recommend this page: >>> https://wiki.italiangrid.it/twiki/bin/view/CREAM/TroubleshootingGuide >>> >>> Cheers, >>> Daniela >>> >>> >>> >>> >>> >> >