Hi Greig,
I checked the logs for your connection and it is hitting a problem with
authorisation under your lhcb user (I've probably not set it up right in
our dcache.kpwd file). Odd that it works for srm1 though.
Looks like it will probably work for everyone *but* you. Typical.
I'll recheck the dcache.kpwd file (we will be moving to something a bit
less archaic soon).
Thanks,
John
Greig Alan Cowan wrote:
> Hi John,
>
> It's definitely not working for me (see below). Certainly from your
> output it looks like it's working. As you say, all of the files look fine.
>
> I can ping the SRMv1 endpoint, it is only the v2.2 one that is complaining.
>
> Could someone else give this a go from outside Liverpool? You will need
> to use the latest dcache-srmclient rpm.
>
> Cheers,
> Greig
>
> $ opt/d-cache/srm/bin/srmping -2 -debug
> srm://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> WARNING: SRM_PATH is defined, which might cause a wrong version of srm
> client to be executed
> WARNING: SRM_PATH=/home/gcowan/opt/d-cache/srm
> Storage Resource Manager (SRM) CP Client version 2.0
> Tue Feb 19 11:30:58 GMT 2008: In SRMClient ExpectedName: host
> Tue Feb 19 11:30:58 GMT 2008: SRMClient(https,srm/managerv2,true)
> SRMClientV2 : user credentials are:
> /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan
> SRMClientV2 : WEBSERVICE_PATH srm/managerv2
> SRMClientV2 : connecting to srm at
> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> SRMClientV2 : srmPing , contacting service
> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> SRMClientV2 : srmPing: try # 0 failed with error
> AxisFault
> faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
> faultSubcode:
> faultString: java.rmi.RemoteException: SRMServerV2.srmPing() exception;
> nested exception is:
> java.lang.NoSuchMethodException:
> org.dcache.srm.v2_2.SrmPingResponse.setStatusCode(org.dcache.srm.v2_2.TStatusCode)
>
> faultActor:
> faultNode:
> faultDetail:
> {http://xml.apache.org/axis/}hostname:hepgrid5.ph.liv.ac.uk
>
> java.rmi.RemoteException: SRMServerV2.srmPing() exception; nested
> exception is:
> java.lang.NoSuchMethodException:
> org.dcache.srm.v2_2.SrmPingResponse.setStatusCode(org.dcache.srm.v2_2.TStatusCode)
>
> at
> org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
>
>
>
> On 19/02/08 11:29, John Bland wrote:
>> Hi,
>>
>>
>> First of all, when did you try to ping (dcache was still restarting when
>> I sent the last email)? Secondly I can ping the srm2 and srm1 endpoints
>> from a liverpool machine:
>>
>> Tue Feb 19 11:25:13 GMT 2008: In SRMClient ExpectedName: host
>> Tue Feb 19 11:25:13 GMT 2008: SRMClient(https,srm/managerv2,true)
>> SRMClientV2 : user credentials are:
>> /C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
>> SRMClientV2 : WEBSERVICE_PATH srm/managerv2
>> SRMClientV2 : connecting to srm at
>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
>> SRMClientV2 : srmPing , contacting service
>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
>> Tue Feb 19 11:25:18 GMT 2008: received response
>> Tue Feb 19 11:25:18 GMT 2008: VersionInfo : v2.2
>> backend_type:dCache
>> backend_version:production-1-8-0-12p4
>>
>> Tue Feb 19 11:25:38 GMT 2008: In SRMClient ExpectedName: host
>> Tue Feb 19 11:25:38 GMT 2008: SRMClient(https,srm/managerv1,true)
>> SRMClientV1 : user credentials are:
>> /C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
>> SRMClientV1 : SRMClientV1 calling
>> org.globus.axis.util.Util.registerTransport()
>> SRMClientV1 : connecting to srm at
>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1
>> Tue Feb 19 11:25:40 GMT 2008: connected to server, obtaining proxy
>> Tue Feb 19 11:25:40 GMT 2008: got proxy of type class
>> org.dcache.srm.client.SRMClientV1
>> Tue Feb 19 11:25:40 GMT 2008: srm ping returned = true
>>
>> Looks like the two endpoints are available to Liverpool addresses. Could
>> you try again, please?
>>
>> For reference I've diffed your files and our current setup, the
>> differences boil down to:
>>
>> srm_setup.env
>> =============
>>
>>> SRM_WEBAPP_DIR=${DCACHE_HOME}/libexec/apache-tomcat-5.5.20/webapps/srm
>> 16d18
>> < SRM_WEBAPP_DIR=${DCACHE_HOME}/srm-webapp
>>
>> dCacheSetup
>> ===========
>>
>> < #useGPlazmaAuthorizationModule=false
>> < useGPlazmaAuthorizationModule=true
>> < #useGPlazmaAuthorizationCell=true
>> < useGPlazmaAuthorizationCell=false
>> ---
>>> useGPlazmaAuthorizationModule=false
>>> useGPlazmaAuthorizationCell=true
>> 211c207
>> < # performanceMarkerPeriod=180
>> ---
>>> performanceMarkerPeriod=10
>> < # srmSpaceManagerEnabled=no
>> ---
>>> srmSpaceManagerEnabled=yes
>>
>> < # srmImplicitSpaceManagerEnabled=yes
>> ---
>>> srmImplicitSpaceManagerEnabled=yes
>>
>> < #parallelStreams=10
>> ---
>>> parallelStreams=1
>>
>> < srmCustomGetHostByAddr=true
>> ---
>>> # srmCustomGetHostByAddr=false
>>
>> < # SpaceManagerDefaultRetentionPolicy=CUSTODIAL
>> ---
>>> SpaceManagerDefaultRetentionPolicy=REPLICA
>> 667c658
>> < # SpaceManagerDefaultAccessLatency=NEARLINE
>> ---
>>> SpaceManagerDefaultAccessLatency=ONLINE
>> 672c663
>> < # SpaceManagerReserveSpaceForNonSRMTransfers=false
>> ---
>>> SpaceManagerReserveSpaceForNonSRMTransfers=true
>> < #billingToDb=no
>> ---
>>> billingToDb=yes
>>
>> srm.batch is identical.
>>
>> The only real difference I can see is that spacemanager isn't activated,
>> but this wasn't activated originally when you could ping our srm2.2
>> endpoint.
>>
>> Regards,
>>
>> John
>>
>> Greig Alan Cowan wrote:
>>> Hi John,
>>>
>>> Still not fixed. It appears that dCache thinks the srm/managerv2
>>> endpoint can only speak SRMv1. Can you compare your files with these:
>>>
>>> http://www.ph.ed.ac.uk/~gcowan1/srm.batch
>>> http://www.ph.ed.ac.uk/~gcowan1/dCacheSetup
>>> http://www.ph.ed.ac.uk/~gcowan1/srm_setup.env
>>>
>>> Thanks,
>>> Greig
>>>
>>>
>>> On 19/02/08 10:01, John Bland wrote:
>>>> Greig Alan Cowan wrote:
>>>>> Hi John,
>>>>>
>>>>> Things seem to be going well with the SAM tests, but I don't seem
>>>>> to be
>>>>> able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas?
>>>> dCacheSetup still had srmVersion=1. I've set this to default (ie
>>>> commented it out) and restarted dcache. Hopefully that was the problem
>>>> and it won't break anything.
>>>>
>>>> John
>>>>
>>>>> Cheers,
>>>>> Greig
>>>>>
>>>>> On 18/02/08 15:45, John Bland wrote:
>>>>>> Hi,
>>>>>>
>>>>>> To follow myself up, we appear to have fixed the dual-homed problems
>>>>>> and can copy files in and out of the SE from internal and external
>>>>>> machines. Tests are starting to pass again (woohoo!).
>>>>>>
>>>>>> We're leaving it a while to see if any more problems were being
>>>>>> masked
>>>>>> by anything we've fixed. If we're clean we'll probably push ahead
>>>>>> with
>>>>>> migrating our pools and getting some space before attempting to break
>>>>>> it all again with the SRM2.2 spacemanager ;0).
>>>>>>
>>>>>> John
>>>>>>
>>>>>> John Bland wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are making progress, of sorts.
>>>>>>>
>>>>>>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't
>>>>>>> been
>>>>>>> generated by yaim, usefully).
>>>>>>>
>>>>>>> I've also been naughty and set the srm1 endpoint as being srm_v1
>>>>>>> rather than SRM. Not sure which of the above fixed things as they
>>>>>>> were changed at the same time but then external SAM tests for SRM to
>>>>>>> hepgrid5 started passing.
>>>>>>>
>>>>>>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
>>>>>>> failing, which continued with the CGSI-gSOAP can't connect errors.
>>>>>>>
>>>>>>> We finally realised this morning that our new WN's were set to
>>>>>>> connect to the internal 192.168 interface on the SE, which had been
>>>>>>> disabled since then due to conflicts between the eth0 and eth1
>>>>>>> addresses causing dcache to fail.
>>>>>>>
>>>>>>> Adding the 192.168 address back to the SE stops the gSOAP errors but
>>>>>>> we still haven't fixed the underlying problem with dcache on
>>>>>>> dual-homed servers.
>>>>>>>
>>>>>>> We are trying to fix that as we don't want internal SE transfers
>>>>>>> battering our firewall/router all the time if possible but it is
>>>>>>> proving obstinate (par for the course it seems). We've set in
>>>>>>> dCacheSetup srmCustomGetHostByAddr=true and followed the
>>>>>>> instructions
>>>>>>> as on
>>>>>>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
>>>>>>>
>>>>>>>
>>>>>>> but gridftp transfers just timeout after opening BINARY data
>>>>>>> connection (but eg edg-gridftp-ls does list as expected).
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>> Greig Alan Cowan wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> Is everything OK with your dCache? I don't seem to be able to
>>>>>>>> srmPing it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Greig
>>>>>>>>
>>>>>>>> On 15/02/08 13:54, John Bland wrote:
>>>>>>>>> Greig Alan Cowan wrote:
>>>>>>>>>> Hi John,
>>>>>>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
>>>>>>>>>>> they're still failing for hepgrid5, but are passing for segrid1
>>>>>>>>>>> (which I fixed earlier today). Still see the gSOAP error for
>>>>>>>>>>> ops/steve lloyd analysis tests.
>>>>>>>>>> No, it's definitely working now:
>>>>>>>>>>
>>>>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The SE tests have been passing since we came online and sorted out
>>>>>>>>> a dcache.kpwd file and permissions.
>>>>>>>>>
>>>>>>>>> What are failing are analysis jobs, such as
>>>>>>>>>
>>>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> with the error
>>>>>>>>>
>>>>>>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP:
>>>>>>>>> Could
>>>>>>>>> not open connection !
>>>>>>>>> lcg_cp: Communication error on send
>>>>>>>>> Error in <TFile::TFile>: file aod.pool.root does not exist
>>>>>>>>> Could not open the file "aod.pool.root"
>>>>>>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
>>>>>>>>> available
>>>>>>>>> WARNING: $POOL_CATALOG is not defined
>>>>>>>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>>>>>>>>
>>>>>>>>> *** Break *** segmentation violation
>>>>>>>>>
>>>>>>>>> or ops SAM Replica Management tests, such as on
>>>>>>>>>
>>>>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> although I can't pick out any specific errors as the SAM site
>>>>>>>>> seems
>>>>>>>>> to be very stodgy today.
>>>>>>>>>
>>>>>>>>>>> I've done this but while some of the changes have shown up in
>>>>>>>>>>> the
>>>>>>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached
>>>>>>>>>>> our
>>>>>>>>>>> static-file-SE.ldif file.
>>>>>>>>>> What about the dSE.ldif file? You need to make sure that it
>>>>>>>>>> contains something like:
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>> I've updated that file as well and it's showing up managerv2 in
>>>>>>>>> our
>>>>>>>>> site BDII now, maybe that might help things.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>
>>
>>
--
Dr John Bland, Systems Administrator
Room 210, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 3396
|