Print

Print


Dear All...

I was about to submit a patch for the CREAMCE / SGE integration when I 
hit the following issue...

---*---

1./ I can perfectly submit a job using glite-ce-job-submit . Check the 
following examples:

-bash-3.00$ glite-ce-job-submit -a -r 
ce03.lip.pt:8443/cream-sge-dteamgrid sleep.jdl
2009-10-13 16:17:31,114 WARN - No configuration file suitable for 
loading. Using built-in configuration
https://ce03.lip.pt:8443/CREAM469661735

-bash-3.00$ glite-ce-job-status https://ce03.lip.pt:8443/CREAM469661735
2009-10-13 16:18:42,920 WARN - No configuration file suitable for 
loading. Using built-in configuration
******  JobID=[https://ce03.lip.pt:8443/CREAM469661735]
     Status        = [DONE-OK]
     ExitCode      = [0]

---*---

2./ However, trying to submit via WMS, using

     glite-wms-job-submit -a -r ce03.lip.pt:8443/cream-sge-dteamgrid 
sleep.jdl,

the job never passes from the READY state.

---*---

3./ Monitoring the messages, glite-ce-cream and glite-ce-monitor logs, I 
conclude that the job never reaches my SGE CREAMCE.

---*---

4./ I checked that the WMS workload manager registered the job,

     13 Oct, 16:30:17 -I: [Info] operator()(dispatcher_utils.cpp:218): 
new jobsubmit for https://wms01.lip.pt:9000/ctA8jPXMMyayYdOUMs84cA
     13 Oct, 16:30:17 -I: [Info] operator()(submit_request.cpp:478): 
https://wms01.lip.pt:9000/ctA8jPXMMyayYdOUMs84cA delivered

but  I can not obtain its condor ID.

     [root@wms01 CondorG.log]# grep  
https://wms01.lip.pt:9000/_yUTkiOYPrcAJbRpjOGOdw 
/var/glite/logmonitor/CondorG.log/*
     [root@wms01 CondorG.log]#

---*---

5./ Finally, when I cancel the job, I see in jobcontoller_events.log, 
the following log:

13 Oct, 16:44:09 -I- ControllerLoop::run(): Got new remove request (JOB 
ID = https://wms01.lip.pt:9000/_yUTkiOYPrcAJbRpjOGOdw)...
13 Oct, 16:44:09 -I- JobControllerReal::cancel(...): Asked to remove 
job: https://wms01.lip.pt:9000/_yUTkiOYPrcAJbRpjOGOdw
13 Oct, 16:44:09 -M- JobControllerReal::readRepository(): Reading 
repository from LogMonitor file: 
/var/glite/logmonitor/internal/irepository.dat
13 Oct, 16:44:10 -*- JobControllerReal::cancel(...): I'm not able to 
retrieve the condor ID.

---*---

6./ I've tried to submit to other CREAM-CEs, bit I've noticed that a 
list-match do not show me any cream-ce

     -bash-3.00$ glite-wms-job-list-match -a sleep.jdl  | grep -i cream
     -bash-3.00$

My JDL is quite simple, and I do not think that I should have problems 
with it:

-bash-3.00$ cat sleep.jdl
Executable      = "sleep.sh";
StdOutput       = "sleep.out";
StdError        = "sleep.err";
RetryCount      = 0;
InputSandbox    = {"sleep.sh"};
OutputSandbox   = {"sleep.out","sleep.err"};
#OutputSandboxBaseDestUri="gsiftp://ce02.lip.pt/tmp";

I've tried to force other creamCEs, using the "-r" option, but I got the 
same results.

I conclude that something must be wrong in my WMS / ICE configuration...

---*---

7./ I'm using the following software:

[root@wms01 ~]# rpm -qa | grep glite-WMS

root@ce03 ~]# rpm -qa | grep CREAM
glite-CREAM-3.1.20-0


Any help is welcome...
Thanks in Advance
Cheers
Goncalo