On 04/03/2012 04:40 PM, [log in to unmask] wrote:
> Firewall is stopped on both CREAM and batch server node.
I am not able to contact that host from my UI
$ telnet lcg0683.gridpp.rl.ac.uk 8443
Trying 130.246.180.171...
>
> Any logging to be increased in verbosity?
>
> The appearances are that the CREAM doesn't even attempt to submit the job to the torque server. No $bls_tmp_file is created and submitted in /usr/bin/pbs_submit.sh (line 190).
>
For the time being I would focus on the timeout received by the client
> [root@lcg0683 ~]# rpm -qf /usr/bin/pbs_submit.sh
> glite-ce-blahp-1.16.4-1.sl5
>
>
> Catalin
>
>
>> -----Original Message-----
>> From: Massimo Sgaravatto [mailto:[log in to unmask]]
>> Sent: 02 April 2012 19:59
>> To: LHC Computer Grid - Rollout
>> Cc: Condurache, Catalin (STFC,RAL,ESC)
>> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>>
>> Are you sure that there aren't firewall problems ?
>>
>> See:
>> https://wiki.italiangrid.it/twiki/bin/view/CREAM/ServiceReferenceCard#O
>> pen_ports
>>
>>
>> Cheers, Massimo
>>
>> On 04/02/2012 08:51 PM, Catalin Condurache wrote:
>>> I can 'direct submit' from CREAM node to the batch server
>>>
>>> [root@lcg0683 ~]# su - dteam003
>>> -bash-3.2$ qsub -q gridS
>>> hostname
>>> 162076.lcgvm-batch01.gridpp.rl.ac.uk
>>> -bash-3.2$ cat STDIN.o162076
>>> *********************************************************************
>>> *
>>> * This is RAL's lcgvm-wn08.gridpp.rl.ac.uk running Linux 2.6.18-
>> 238.19.1.el5
>>> * on an Intel(R) Xeon(R) CPU E5540 @ 2.53GHz processor
>>> * running at a speed of 2469.833 MHz
>>> *
>>> * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk for dteam003 started at
>> Mon Apr 2 19:42:46 BST 2012
>>> *
>>> *********************************************************************
>>> lcgvm-wn08.gridpp.rl.ac.uk
>>> *********************************************************************
>>> *
>>> * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk terminated at Mon Apr 2
>> 19:42:46 BST 2012
>>> *
>>> * Job details:
>>> * Userid: dteam003
>>> * Groupid: dteam
>>> * Jobname: STDIN
>>> * Queue: gridS
>>> * Session Id: 8605
>>> *
>>> * Resources Requested
>>> *
>>> * cput=01:00:00,neednodes=lcgvm-
>> wn08.gridpp.rl.ac.uk,pcput=01:00:00,walltime=02:00:00
>>> *
>>> * Resources Used
>>> *
>>> * cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:00
>>> *
>>> *********************************************************************
>>>
>>>
>>> However the glite-ce-job-submit from an UI gives the timeout every
>> time.
>>>
>>> The only thing in catalina.out
>>>
>>> Using CATALINA_BASE: /usr/share/tomcat5
>>> Using CATALINA_HOME: /usr/share/tomcat5
>>> Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
>>> Using JRE_HOME:
>>> log4j:WARN No appenders could be found for logger
>> (org.apache.commons.digester.Digester.sax).
>>> log4j:WARN Please initialize the log4j system properly.
>>> Using CATALINA_BASE: /usr/share/tomcat5
>>> Using CATALINA_HOME: /usr/share/tomcat5
>>> Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
>>> Using JRE_HOME:
>>> log4j:WARN No appenders could be found for logger
>> (org.apache.catalina.startup.Embedded).
>>> log4j:WARN Please initialize the log4j system properly.
>>> Trustmanager-tomcat v3.0.0-1-E starting.
>>> Using trustmanager library v3.0.5-1-E.
>>> - Initializing VOMS certificate store from directory: /etc/grid-
>> security/vomsdir
>>> - VOMS store initialized
>>> AbandonedObjectPool is used
>> (org.apache.commons.dbcp.AbandonedObjectPool@6f57b46f)
>>> LogAbandoned: false
>>> RemoveAbandoned: true
>>> RemoveAbandonedTimeout: 30
>>> AbandonedObjectPool is used
>> (org.apache.commons.dbcp.AbandonedObjectPool@5dce1bea)
>>> LogAbandoned: false
>>> RemoveAbandoned: true
>>> RemoveAbandonedTimeout: 30
>>>
>>> Just to mention that the configuration is using the old BLAH blparser
>>>
>>> Regards,
>>> Catalin
>>>
>>>
>>> ________________________________________
>>> From: Massimo Sgaravatto [[log in to unmask]]
>>> Sent: 02 April 2012 17:58
>>> To: LHC Computer Grid - Rollout
>>> Cc: Condurache, Catalin (STFC,RAL,ESC)
>>> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>>>
>>> Do you always get a timeout when trying to submit ?
>>>
>>> Is something reported in catalina.out ?
>>>
>>> Cheers, Massimo
>>>
>>>
>>>
>>>
>>> On 04/02/2012 12:36 PM, Catalin Condurache wrote:
>>>> Hi,
>>>> I am looking into EMI CREAM (with Torque) and follow some docs
>> available
>>>> on-line. Unfortunate I am failing one of the very first tests.
>>>> *** On UI
>>>> -bash-3.2$ cat ./test_emi_cream.jdl
>>>> [
>>>> executable="/bin/sleep";
>>>> arguments="1";
>>>> ]
>>>> -bash-3.2$ glite-ce-job-submit -a -r
>>>> lcg0683.gridpp.rl.ac.uk:8443/cream-pbs-grid500M ./test_emi_cream.jdl
>>>> EOF detected during communication. Probably service closed
>> connection or
>>>> SOCKET TIMEOUT occurred.
>>>> -bash-3.2$
>>>> *** On EMI CREAM (lcg0683)
>>>> 02 Apr 2012 11:13:48,464 INFO
>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>> (AuthorizationHandler.java:247) - (TP-Processor22) request for
>>>> operation={http://www.gridsite.org/namespaces/delegation-
>> 2}getProxyReq;
>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
>>>> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
>>>> /dteam/Role=NULL/Capability=NULL;
>> /dteam/uki/Role=NULL/Capability=NULL;
>>>> }; AUTHORIZED!
>>>> 02 Apr 2012 11:13:49,516 INFO
>> org.glite.ce.commonj.authz.VomsServicePDP
>>>> (VomsServicePDP.java:160) - (TP-Processor25) VOMS attribute
>> authorized:
>>>> /dteam/Role=NULL/Capability=NULL
>>>> 02 Apr 2012 11:13:49,517 INFO
>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>> (AuthorizationHandler.java:247) - (TP-Processor25) request for
>>>> operation={http://www.gridsite.org/namespaces/delegation-2}putProxy;
>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
>>>> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
>>>> /dteam/Role=NULL/Capability=NULL;
>> /dteam/uki/Role=NULL/Capability=NULL;
>>>> }; AUTHORIZED!
>>>> 02 Apr 2012 11:13:49,704 INFO org.glite.ce.cream.ws.CREAM2Service
>>>> (CREAM2Service.java:1615) - (Thread-6) New delegation proxy created
>>>> [delegationId=a360114e2b97c3e4bb2e3f4bc0a2df01a9c872a6;
>>>>
>> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
>> NULL_Capability_NULL]
>>>> valid from 02/04/12 10:08 (GMT) to 02/04/12 20:47 (GMT)
>>>> 02 Apr 2012 11:13:49,851 INFO
>> org.glite.ce.commonj.authz.VomsServicePDP
>>>> (VomsServicePDP.java:160) - (TP-Processor22) VOMS attribute
>> authorized:
>>>> /dteam/Role=NULL/Capability=NULL
>>>> 02 Apr 2012 11:13:49,852 INFO
>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>> (AuthorizationHandler.java:247) - (TP-Processor22) request for
>>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=130.246.183.188;
>>>> USER_DN=CN=catalin condurache,L=RAL,OU=CLRC,O=eScience,C=UK;
>> USER_FQAN={
>>>> /dteam/Role=NULL/Capability=NULL;
>> /dteam/uki/Role=NULL/Capability=NULL;
>>>> }; AUTHORIZED!
>>>> 02 Apr 2012 11:13:49,876 INFO
>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
>>>> (AbstractJobExecutor.java:1940) - (TP-Processor22)
>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188;
>>>> USER_DN=/C=UK/O=eScience/OU=CLRC/L=RAL/CN=catalin condurache;
>>>> USER_FQAN={ /dteam/Role=NULL/Capability=NULL;
>>>> /dteam/uki/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_REGISTER;
>>>> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
>>>> commandName=JOB_REGISTER;
>>>>
>> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
>> NULL_Capability_NULL;
>>>> status=PROCESSING;
>>>> 02 Apr 2012 11:13:49,910 INFO
>>>> org.glite.ce.cream.jobmanagement.db.table.JobTable
>> (JobTable.java:232) -
>>>> (TP-Processor22) Job inserted. JobId = CREAM011289382
>>>> *** on UI
>>>> -bash-3.2$ glite-ce-job-status --level 2
>>>> https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528
>>>> ****** JobID=[https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528]
>>>> For this job CREAM has returned a fault: MethodName=[jobInfo]
>>>> Timestamp=[Mon 02 Apr 2012 11:11:32] ErrorCode=[4] Description=[job
>>>> status mismatch] FaultCause=[N/A]
>>>> I couldn't find any logs on the Torque server (or I didn't know
>> where to
>>>> look for).
>>>> I admit it might be something quite simple, but overlooked by me.
>> Any ideas?
>>>> Thanks,
>>>> Catalin
>>>
>>>
>>
>
|