Firewall is stopped on both CREAM and batch server node.
Any logging to be increased in verbosity?
The appearances are that the CREAM doesn't even attempt to submit the job to the torque server. No $bls_tmp_file is created and submitted in /usr/bin/pbs_submit.sh (line 190).
[root@lcg0683 ~]# rpm -qf /usr/bin/pbs_submit.sh
glite-ce-blahp-1.16.4-1.sl5
Catalin
> -----Original Message-----
> From: Massimo Sgaravatto [mailto:[log in to unmask]]
> Sent: 02 April 2012 19:59
> To: LHC Computer Grid - Rollout
> Cc: Condurache, Catalin (STFC,RAL,ESC)
> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>
> Are you sure that there aren't firewall problems ?
>
> See:
> https://wiki.italiangrid.it/twiki/bin/view/CREAM/ServiceReferenceCard#O
> pen_ports
>
>
> Cheers, Massimo
>
> On 04/02/2012 08:51 PM, Catalin Condurache wrote:
> > I can 'direct submit' from CREAM node to the batch server
> >
> > [root@lcg0683 ~]# su - dteam003
> > -bash-3.2$ qsub -q gridS
> > hostname
> > 162076.lcgvm-batch01.gridpp.rl.ac.uk
> > -bash-3.2$ cat STDIN.o162076
> > *********************************************************************
> > *
> > * This is RAL's lcgvm-wn08.gridpp.rl.ac.uk running Linux 2.6.18-
> 238.19.1.el5
> > * on an Intel(R) Xeon(R) CPU E5540 @ 2.53GHz processor
> > * running at a speed of 2469.833 MHz
> > *
> > * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk for dteam003 started at
> Mon Apr 2 19:42:46 BST 2012
> > *
> > *********************************************************************
> > lcgvm-wn08.gridpp.rl.ac.uk
> > *********************************************************************
> > *
> > * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk terminated at Mon Apr 2
> 19:42:46 BST 2012
> > *
> > * Job details:
> > * Userid: dteam003
> > * Groupid: dteam
> > * Jobname: STDIN
> > * Queue: gridS
> > * Session Id: 8605
> > *
> > * Resources Requested
> > *
> > * cput=01:00:00,neednodes=lcgvm-
> wn08.gridpp.rl.ac.uk,pcput=01:00:00,walltime=02:00:00
> > *
> > * Resources Used
> > *
> > * cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:00
> > *
> > *********************************************************************
> >
> >
> > However the glite-ce-job-submit from an UI gives the timeout every
> time.
> >
> > The only thing in catalina.out
> >
> > Using CATALINA_BASE: /usr/share/tomcat5
> > Using CATALINA_HOME: /usr/share/tomcat5
> > Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
> > Using JRE_HOME:
> > log4j:WARN No appenders could be found for logger
> (org.apache.commons.digester.Digester.sax).
> > log4j:WARN Please initialize the log4j system properly.
> > Using CATALINA_BASE: /usr/share/tomcat5
> > Using CATALINA_HOME: /usr/share/tomcat5
> > Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
> > Using JRE_HOME:
> > log4j:WARN No appenders could be found for logger
> (org.apache.catalina.startup.Embedded).
> > log4j:WARN Please initialize the log4j system properly.
> > Trustmanager-tomcat v3.0.0-1-E starting.
> > Using trustmanager library v3.0.5-1-E.
> > - Initializing VOMS certificate store from directory: /etc/grid-
> security/vomsdir
> > - VOMS store initialized
> > AbandonedObjectPool is used
> (org.apache.commons.dbcp.AbandonedObjectPool@6f57b46f)
> > LogAbandoned: false
> > RemoveAbandoned: true
> > RemoveAbandonedTimeout: 30
> > AbandonedObjectPool is used
> (org.apache.commons.dbcp.AbandonedObjectPool@5dce1bea)
> > LogAbandoned: false
> > RemoveAbandoned: true
> > RemoveAbandonedTimeout: 30
> >
> > Just to mention that the configuration is using the old BLAH blparser
> >
> > Regards,
> > Catalin
> >
> >
> > ________________________________________
> > From: Massimo Sgaravatto [[log in to unmask]]
> > Sent: 02 April 2012 17:58
> > To: LHC Computer Grid - Rollout
> > Cc: Condurache, Catalin (STFC,RAL,ESC)
> > Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
> >
> > Do you always get a timeout when trying to submit ?
> >
> > Is something reported in catalina.out ?
> >
> > Cheers, Massimo
> >
> >
> >
> >
> > On 04/02/2012 12:36 PM, Catalin Condurache wrote:
> >> Hi,
> >> I am looking into EMI CREAM (with Torque) and follow some docs
> available
> >> on-line. Unfortunate I am failing one of the very first tests.
> >> *** On UI
> >> -bash-3.2$ cat ./test_emi_cream.jdl
> >> [
> >> executable="/bin/sleep";
> >> arguments="1";
> >> ]
> >> -bash-3.2$ glite-ce-job-submit -a -r
> >> lcg0683.gridpp.rl.ac.uk:8443/cream-pbs-grid500M ./test_emi_cream.jdl
> >> EOF detected during communication. Probably service closed
> connection or
> >> SOCKET TIMEOUT occurred.
> >> -bash-3.2$
> >> *** On EMI CREAM (lcg0683)
> >> 02 Apr 2012 11:13:48,464 INFO
> >> org.glite.ce.commonj.authz.AuthorizationHandler
> >> (AuthorizationHandler.java:247) - (TP-Processor22) request for
> >> operation={http://www.gridsite.org/namespaces/delegation-
> 2}getProxyReq;
> >> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
> >> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
> >> /dteam/Role=NULL/Capability=NULL;
> /dteam/uki/Role=NULL/Capability=NULL;
> >> }; AUTHORIZED!
> >> 02 Apr 2012 11:13:49,516 INFO
> org.glite.ce.commonj.authz.VomsServicePDP
> >> (VomsServicePDP.java:160) - (TP-Processor25) VOMS attribute
> authorized:
> >> /dteam/Role=NULL/Capability=NULL
> >> 02 Apr 2012 11:13:49,517 INFO
> >> org.glite.ce.commonj.authz.AuthorizationHandler
> >> (AuthorizationHandler.java:247) - (TP-Processor25) request for
> >> operation={http://www.gridsite.org/namespaces/delegation-2}putProxy;
> >> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
> >> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
> >> /dteam/Role=NULL/Capability=NULL;
> /dteam/uki/Role=NULL/Capability=NULL;
> >> }; AUTHORIZED!
> >> 02 Apr 2012 11:13:49,704 INFO org.glite.ce.cream.ws.CREAM2Service
> >> (CREAM2Service.java:1615) - (Thread-6) New delegation proxy created
> >> [delegationId=a360114e2b97c3e4bb2e3f4bc0a2df01a9c872a6;
> >>
> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
> NULL_Capability_NULL]
> >> valid from 02/04/12 10:08 (GMT) to 02/04/12 20:47 (GMT)
> >> 02 Apr 2012 11:13:49,851 INFO
> org.glite.ce.commonj.authz.VomsServicePDP
> >> (VomsServicePDP.java:160) - (TP-Processor22) VOMS attribute
> authorized:
> >> /dteam/Role=NULL/Capability=NULL
> >> 02 Apr 2012 11:13:49,852 INFO
> >> org.glite.ce.commonj.authz.AuthorizationHandler
> >> (AuthorizationHandler.java:247) - (TP-Processor22) request for
> >> operation=JobRegister; REMOTE_REQUEST_ADDRESS=130.246.183.188;
> >> USER_DN=CN=catalin condurache,L=RAL,OU=CLRC,O=eScience,C=UK;
> USER_FQAN={
> >> /dteam/Role=NULL/Capability=NULL;
> /dteam/uki/Role=NULL/Capability=NULL;
> >> }; AUTHORIZED!
> >> 02 Apr 2012 11:13:49,876 INFO
> >> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >> (AbstractJobExecutor.java:1940) - (TP-Processor22)
> >> REMOTE_REQUEST_ADDRESS=130.246.183.188;
> >> USER_DN=/C=UK/O=eScience/OU=CLRC/L=RAL/CN=catalin condurache;
> >> USER_FQAN={ /dteam/Role=NULL/Capability=NULL;
> >> /dteam/uki/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_REGISTER;
> >> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
> >> commandName=JOB_REGISTER;
> >>
> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
> NULL_Capability_NULL;
> >> status=PROCESSING;
> >> 02 Apr 2012 11:13:49,910 INFO
> >> org.glite.ce.cream.jobmanagement.db.table.JobTable
> (JobTable.java:232) -
> >> (TP-Processor22) Job inserted. JobId = CREAM011289382
> >> *** on UI
> >> -bash-3.2$ glite-ce-job-status --level 2
> >> https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528
> >> ****** JobID=[https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528]
> >> For this job CREAM has returned a fault: MethodName=[jobInfo]
> >> Timestamp=[Mon 02 Apr 2012 11:11:32] ErrorCode=[4] Description=[job
> >> status mismatch] FaultCause=[N/A]
> >> I couldn't find any logs on the Torque server (or I didn't know
> where to
> >> look for).
> >> I admit it might be something quite simple, but overlooked by me.
> Any ideas?
> >> Thanks,
> >> Catalin
> >
> >
>
|