Please open a ggus ticket, proving the glite-ce-cream.log and the
catalina.out log files
Cheers, Massimo
On 04/03/2012 10:52 PM, Catalin Condurache wrote:
>
> The RAL Tier1 network is firewall'ed at RAL level. So one from outside cannot access a node if proper inbound firewall settings are not in place.
>
> But definitely no firewall between UI, CREAM and torque server connections within RAL T1 network.
>
> Is this not enough?
>
> Catalin
>
>> -----Original Message-----
>> From: Massimo Sgaravatto [mailto:[log in to unmask]]
>> Sent: 03 April 2012 15:52
>> To: Condurache, Catalin (STFC,RAL,ESC)
>> Cc: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>>
>> On 04/03/2012 04:40 PM, [log in to unmask] wrote:
>>> Firewall is stopped on both CREAM and batch server node.
>>
>>
>> I am not able to contact that host from my UI
>> $ telnet lcg0683.gridpp.rl.ac.uk 8443
>> Trying 130.246.180.171...
>>
>>
>>
>>>
>>> Any logging to be increased in verbosity?
>>>
>>> The appearances are that the CREAM doesn't even attempt to submit the
>> job to the torque server. No $bls_tmp_file is created and submitted in
>> /usr/bin/pbs_submit.sh (line 190).
>>>
>>
>> For the time being I would focus on the timeout received by the client
>>
>>
>>> [root@lcg0683 ~]# rpm -qf /usr/bin/pbs_submit.sh
>>> glite-ce-blahp-1.16.4-1.sl5
>>>
>>>
>>> Catalin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Massimo Sgaravatto [mailto:[log in to unmask]]
>>>> Sent: 02 April 2012 19:59
>>>> To: LHC Computer Grid - Rollout
>>>> Cc: Condurache, Catalin (STFC,RAL,ESC)
>>>> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>>>>
>>>> Are you sure that there aren't firewall problems ?
>>>>
>>>> See:
>>>>
>> https://wiki.italiangrid.it/twiki/bin/view/CREAM/ServiceReferenceCard#O
>>>> pen_ports
>>>>
>>>>
>>>> Cheers, Massimo
>>>>
>>>> On 04/02/2012 08:51 PM, Catalin Condurache wrote:
>>>>> I can 'direct submit' from CREAM node to the batch server
>>>>>
>>>>> [root@lcg0683 ~]# su - dteam003
>>>>> -bash-3.2$ qsub -q gridS
>>>>> hostname
>>>>> 162076.lcgvm-batch01.gridpp.rl.ac.uk
>>>>> -bash-3.2$ cat STDIN.o162076
>>>>>
>> *********************************************************************
>>>>> *
>>>>> * This is RAL's lcgvm-wn08.gridpp.rl.ac.uk running Linux 2.6.18-
>>>> 238.19.1.el5
>>>>> * on an Intel(R) Xeon(R) CPU E5540 @ 2.53GHz processor
>>>>> * running at a speed of 2469.833 MHz
>>>>> *
>>>>> * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk for dteam003 started
>> at
>>>> Mon Apr 2 19:42:46 BST 2012
>>>>> *
>>>>>
>> *********************************************************************
>>>>> lcgvm-wn08.gridpp.rl.ac.uk
>>>>>
>> *********************************************************************
>>>>> *
>>>>> * Job 162076.lcgvm-batch01.gridpp.rl.ac.uk terminated at Mon Apr
>> 2
>>>> 19:42:46 BST 2012
>>>>> *
>>>>> * Job details:
>>>>> * Userid: dteam003
>>>>> * Groupid: dteam
>>>>> * Jobname: STDIN
>>>>> * Queue: gridS
>>>>> * Session Id: 8605
>>>>> *
>>>>> * Resources Requested
>>>>> *
>>>>> * cput=01:00:00,neednodes=lcgvm-
>>>> wn08.gridpp.rl.ac.uk,pcput=01:00:00,walltime=02:00:00
>>>>> *
>>>>> * Resources Used
>>>>> *
>>>>> * cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:00
>>>>> *
>>>>>
>> *********************************************************************
>>>>>
>>>>>
>>>>> However the glite-ce-job-submit from an UI gives the timeout every
>>>> time.
>>>>>
>>>>> The only thing in catalina.out
>>>>>
>>>>> Using CATALINA_BASE: /usr/share/tomcat5
>>>>> Using CATALINA_HOME: /usr/share/tomcat5
>>>>> Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
>>>>> Using JRE_HOME:
>>>>> log4j:WARN No appenders could be found for logger
>>>> (org.apache.commons.digester.Digester.sax).
>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>> Using CATALINA_BASE: /usr/share/tomcat5
>>>>> Using CATALINA_HOME: /usr/share/tomcat5
>>>>> Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
>>>>> Using JRE_HOME:
>>>>> log4j:WARN No appenders could be found for logger
>>>> (org.apache.catalina.startup.Embedded).
>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>> Trustmanager-tomcat v3.0.0-1-E starting.
>>>>> Using trustmanager library v3.0.5-1-E.
>>>>> - Initializing VOMS certificate store from directory: /etc/grid-
>>>> security/vomsdir
>>>>> - VOMS store initialized
>>>>> AbandonedObjectPool is used
>>>> (org.apache.commons.dbcp.AbandonedObjectPool@6f57b46f)
>>>>> LogAbandoned: false
>>>>> RemoveAbandoned: true
>>>>> RemoveAbandonedTimeout: 30
>>>>> AbandonedObjectPool is used
>>>> (org.apache.commons.dbcp.AbandonedObjectPool@5dce1bea)
>>>>> LogAbandoned: false
>>>>> RemoveAbandoned: true
>>>>> RemoveAbandonedTimeout: 30
>>>>>
>>>>> Just to mention that the configuration is using the old BLAH
>> blparser
>>>>>
>>>>> Regards,
>>>>> Catalin
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Massimo Sgaravatto [[log in to unmask]]
>>>>> Sent: 02 April 2012 17:58
>>>>> To: LHC Computer Grid - Rollout
>>>>> Cc: Condurache, Catalin (STFC,RAL,ESC)
>>>>> Subject: Re: [LCG-ROLLOUT] starting with EMI CREAM
>>>>>
>>>>> Do you always get a timeout when trying to submit ?
>>>>>
>>>>> Is something reported in catalina.out ?
>>>>>
>>>>> Cheers, Massimo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 04/02/2012 12:36 PM, Catalin Condurache wrote:
>>>>>> Hi,
>>>>>> I am looking into EMI CREAM (with Torque) and follow some docs
>>>> available
>>>>>> on-line. Unfortunate I am failing one of the very first tests.
>>>>>> *** On UI
>>>>>> -bash-3.2$ cat ./test_emi_cream.jdl
>>>>>> [
>>>>>> executable="/bin/sleep";
>>>>>> arguments="1";
>>>>>> ]
>>>>>> -bash-3.2$ glite-ce-job-submit -a -r
>>>>>> lcg0683.gridpp.rl.ac.uk:8443/cream-pbs-grid500M
>> ./test_emi_cream.jdl
>>>>>> EOF detected during communication. Probably service closed
>>>> connection or
>>>>>> SOCKET TIMEOUT occurred.
>>>>>> -bash-3.2$
>>>>>> *** On EMI CREAM (lcg0683)
>>>>>> 02 Apr 2012 11:13:48,464 INFO
>>>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>>>> (AuthorizationHandler.java:247) - (TP-Processor22) request for
>>>>>> operation={http://www.gridsite.org/namespaces/delegation-
>>>> 2}getProxyReq;
>>>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
>>>>>> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
>>>>>> /dteam/Role=NULL/Capability=NULL;
>>>> /dteam/uki/Role=NULL/Capability=NULL;
>>>>>> }; AUTHORIZED!
>>>>>> 02 Apr 2012 11:13:49,516 INFO
>>>> org.glite.ce.commonj.authz.VomsServicePDP
>>>>>> (VomsServicePDP.java:160) - (TP-Processor25) VOMS attribute
>>>> authorized:
>>>>>> /dteam/Role=NULL/Capability=NULL
>>>>>> 02 Apr 2012 11:13:49,517 INFO
>>>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>>>> (AuthorizationHandler.java:247) - (TP-Processor25) request for
>>>>>> operation={http://www.gridsite.org/namespaces/delegation-
>> 2}putProxy;
>>>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188; USER_DN=CN=catalin
>>>>>> condurache,L=RAL,OU=CLRC,O=eScience,C=UK; USER_FQAN={
>>>>>> /dteam/Role=NULL/Capability=NULL;
>>>> /dteam/uki/Role=NULL/Capability=NULL;
>>>>>> }; AUTHORIZED!
>>>>>> 02 Apr 2012 11:13:49,704 INFO org.glite.ce.cream.ws.CREAM2Service
>>>>>> (CREAM2Service.java:1615) - (Thread-6) New delegation proxy
>> created
>>>>>> [delegationId=a360114e2b97c3e4bb2e3f4bc0a2df01a9c872a6;
>>>>>>
>>>>
>> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
>>>> NULL_Capability_NULL]
>>>>>> valid from 02/04/12 10:08 (GMT) to 02/04/12 20:47 (GMT)
>>>>>> 02 Apr 2012 11:13:49,851 INFO
>>>> org.glite.ce.commonj.authz.VomsServicePDP
>>>>>> (VomsServicePDP.java:160) - (TP-Processor22) VOMS attribute
>>>> authorized:
>>>>>> /dteam/Role=NULL/Capability=NULL
>>>>>> 02 Apr 2012 11:13:49,852 INFO
>>>>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>>>>> (AuthorizationHandler.java:247) - (TP-Processor22) request for
>>>>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=130.246.183.188;
>>>>>> USER_DN=CN=catalin condurache,L=RAL,OU=CLRC,O=eScience,C=UK;
>>>> USER_FQAN={
>>>>>> /dteam/Role=NULL/Capability=NULL;
>>>> /dteam/uki/Role=NULL/Capability=NULL;
>>>>>> }; AUTHORIZED!
>>>>>> 02 Apr 2012 11:13:49,876 INFO
>>>>>>
>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
>>>>>> (AbstractJobExecutor.java:1940) - (TP-Processor22)
>>>>>> REMOTE_REQUEST_ADDRESS=130.246.183.188;
>>>>>> USER_DN=/C=UK/O=eScience/OU=CLRC/L=RAL/CN=catalin condurache;
>>>>>> USER_FQAN={ /dteam/Role=NULL/Capability=NULL;
>>>>>> /dteam/uki/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_REGISTER;
>>>>>> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
>>>>>> commandName=JOB_REGISTER;
>>>>>>
>>>>
>> userId=_C_UK_O_eScience_OU_CLRC_L_RAL_CN_catalin_condurache_dteam_Role_
>>>> NULL_Capability_NULL;
>>>>>> status=PROCESSING;
>>>>>> 02 Apr 2012 11:13:49,910 INFO
>>>>>> org.glite.ce.cream.jobmanagement.db.table.JobTable
>>>> (JobTable.java:232) -
>>>>>> (TP-Processor22) Job inserted. JobId = CREAM011289382
>>>>>> *** on UI
>>>>>> -bash-3.2$ glite-ce-job-status --level 2
>>>>>> https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528
>>>>>> ****** JobID=[https://lcg0683.gridpp.rl.ac.uk:8443/CREAM526521528]
>>>>>> For this job CREAM has returned a fault: MethodName=[jobInfo]
>>>>>> Timestamp=[Mon 02 Apr 2012 11:11:32] ErrorCode=[4]
>> Description=[job
>>>>>> status mismatch] FaultCause=[N/A]
>>>>>> I couldn't find any logs on the Torque server (or I didn't know
>>>> where to
>>>>>> look for).
>>>>>> I admit it might be something quite simple, but overlooked by me.
>>>> Any ideas?
>>>>>> Thanks,
>>>>>> Catalin
>>>>>
>>>>>
>>>>
>>>
>>
>
|