Hi,
First of all, when did you try to ping (dcache was still restarting when
I sent the last email)? Secondly I can ping the srm2 and srm1 endpoints
from a liverpool machine:
Tue Feb 19 11:25:13 GMT 2008: In SRMClient ExpectedName: host
Tue Feb 19 11:25:13 GMT 2008: SRMClient(https,srm/managerv2,true)
SRMClientV2 : user credentials are:
/C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
SRMClientV2 : WEBSERVICE_PATH srm/managerv2
SRMClientV2 : connecting to srm at
httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
SRMClientV2 : srmPing , contacting service
httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
Tue Feb 19 11:25:18 GMT 2008: received response
Tue Feb 19 11:25:18 GMT 2008: VersionInfo : v2.2
backend_type:dCache
backend_version:production-1-8-0-12p4
Tue Feb 19 11:25:38 GMT 2008: In SRMClient ExpectedName: host
Tue Feb 19 11:25:38 GMT 2008: SRMClient(https,srm/managerv1,true)
SRMClientV1 : user credentials are:
/C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
SRMClientV1 : SRMClientV1 calling
org.globus.axis.util.Util.registerTransport()
SRMClientV1 : connecting to srm at
httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1
Tue Feb 19 11:25:40 GMT 2008: connected to server, obtaining proxy
Tue Feb 19 11:25:40 GMT 2008: got proxy of type class
org.dcache.srm.client.SRMClientV1
Tue Feb 19 11:25:40 GMT 2008: srm ping returned = true
Looks like the two endpoints are available to Liverpool addresses. Could
you try again, please?
For reference I've diffed your files and our current setup, the
differences boil down to:
srm_setup.env
=============
> SRM_WEBAPP_DIR=${DCACHE_HOME}/libexec/apache-tomcat-5.5.20/webapps/srm
16d18
< SRM_WEBAPP_DIR=${DCACHE_HOME}/srm-webapp
dCacheSetup
===========
< #useGPlazmaAuthorizationModule=false
< useGPlazmaAuthorizationModule=true
< #useGPlazmaAuthorizationCell=true
< useGPlazmaAuthorizationCell=false
---
> useGPlazmaAuthorizationModule=false
> useGPlazmaAuthorizationCell=true
211c207
< # performanceMarkerPeriod=180
---
> performanceMarkerPeriod=10
< # srmSpaceManagerEnabled=no
---
> srmSpaceManagerEnabled=yes
< # srmImplicitSpaceManagerEnabled=yes
---
> srmImplicitSpaceManagerEnabled=yes
< #parallelStreams=10
---
> parallelStreams=1
< srmCustomGetHostByAddr=true
---
> # srmCustomGetHostByAddr=false
< # SpaceManagerDefaultRetentionPolicy=CUSTODIAL
---
> SpaceManagerDefaultRetentionPolicy=REPLICA
667c658
< # SpaceManagerDefaultAccessLatency=NEARLINE
---
> SpaceManagerDefaultAccessLatency=ONLINE
672c663
< # SpaceManagerReserveSpaceForNonSRMTransfers=false
---
> SpaceManagerReserveSpaceForNonSRMTransfers=true
< #billingToDb=no
---
> billingToDb=yes
srm.batch is identical.
The only real difference I can see is that spacemanager isn't activated,
but this wasn't activated originally when you could ping our srm2.2
endpoint.
Regards,
John
Greig Alan Cowan wrote:
> Hi John,
>
> Still not fixed. It appears that dCache thinks the srm/managerv2
> endpoint can only speak SRMv1. Can you compare your files with these:
>
> http://www.ph.ed.ac.uk/~gcowan1/srm.batch
> http://www.ph.ed.ac.uk/~gcowan1/dCacheSetup
> http://www.ph.ed.ac.uk/~gcowan1/srm_setup.env
>
> Thanks,
> Greig
>
>
> On 19/02/08 10:01, John Bland wrote:
>> Greig Alan Cowan wrote:
>>> Hi John,
>>>
>>> Things seem to be going well with the SAM tests, but I don't seem to be
>>> able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas?
>>
>> dCacheSetup still had srmVersion=1. I've set this to default (ie
>> commented it out) and restarted dcache. Hopefully that was the problem
>> and it won't break anything.
>>
>> John
>>
>>> Cheers,
>>> Greig
>>>
>>> On 18/02/08 15:45, John Bland wrote:
>>>> Hi,
>>>>
>>>> To follow myself up, we appear to have fixed the dual-homed problems
>>>> and can copy files in and out of the SE from internal and external
>>>> machines. Tests are starting to pass again (woohoo!).
>>>>
>>>> We're leaving it a while to see if any more problems were being masked
>>>> by anything we've fixed. If we're clean we'll probably push ahead with
>>>> migrating our pools and getting some space before attempting to break
>>>> it all again with the SRM2.2 spacemanager ;0).
>>>>
>>>> John
>>>>
>>>> John Bland wrote:
>>>>> Hi,
>>>>>
>>>>> We are making progress, of sorts.
>>>>>
>>>>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been
>>>>> generated by yaim, usefully).
>>>>>
>>>>> I've also been naughty and set the srm1 endpoint as being srm_v1
>>>>> rather than SRM. Not sure which of the above fixed things as they
>>>>> were changed at the same time but then external SAM tests for SRM to
>>>>> hepgrid5 started passing.
>>>>>
>>>>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
>>>>> failing, which continued with the CGSI-gSOAP can't connect errors.
>>>>>
>>>>> We finally realised this morning that our new WN's were set to
>>>>> connect to the internal 192.168 interface on the SE, which had been
>>>>> disabled since then due to conflicts between the eth0 and eth1
>>>>> addresses causing dcache to fail.
>>>>>
>>>>> Adding the 192.168 address back to the SE stops the gSOAP errors but
>>>>> we still haven't fixed the underlying problem with dcache on
>>>>> dual-homed servers.
>>>>>
>>>>> We are trying to fix that as we don't want internal SE transfers
>>>>> battering our firewall/router all the time if possible but it is
>>>>> proving obstinate (par for the course it seems). We've set in
>>>>> dCacheSetup srmCustomGetHostByAddr=true and followed the instructions
>>>>> as on
>>>>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
>>>>>
>>>>> but gridftp transfers just timeout after opening BINARY data
>>>>> connection (but eg edg-gridftp-ls does list as expected).
>>>>>
>>>>> John
>>>>>
>>>>> Greig Alan Cowan wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> Is everything OK with your dCache? I don't seem to be able to
>>>>>> srmPing it.
>>>>>>
>>>>>> Thanks,
>>>>>> Greig
>>>>>>
>>>>>> On 15/02/08 13:54, John Bland wrote:
>>>>>>> Greig Alan Cowan wrote:
>>>>>>>> Hi John,
>>>>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
>>>>>>>>> they're still failing for hepgrid5, but are passing for segrid1
>>>>>>>>> (which I fixed earlier today). Still see the gSOAP error for
>>>>>>>>> ops/steve lloyd analysis tests.
>>>>>>>> No, it's definitely working now:
>>>>>>>>
>>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The SE tests have been passing since we came online and sorted out
>>>>>>> a dcache.kpwd file and permissions.
>>>>>>>
>>>>>>> What are failing are analysis jobs, such as
>>>>>>>
>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> with the error
>>>>>>>
>>>>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could
>>>>>>> not open connection !
>>>>>>> lcg_cp: Communication error on send
>>>>>>> Error in <TFile::TFile>: file aod.pool.root does not exist
>>>>>>> Could not open the file "aod.pool.root"
>>>>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
>>>>>>> available
>>>>>>> WARNING: $POOL_CATALOG is not defined
>>>>>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>>>>>>
>>>>>>> *** Break *** segmentation violation
>>>>>>>
>>>>>>> or ops SAM Replica Management tests, such as on
>>>>>>>
>>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> although I can't pick out any specific errors as the SAM site seems
>>>>>>> to be very stodgy today.
>>>>>>>
>>>>>>>>> I've done this but while some of the changes have shown up in the
>>>>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our
>>>>>>>>> static-file-SE.ldif file.
>>>>>>>> What about the dSE.ldif file? You need to make sure that it
>>>>>>>> contains something like:
>>>>>>> [snip]
>>>>>>>
>>>>>>> I've updated that file as well and it's showing up managerv2 in our
>>>>>>> site BDII now, maybe that might help things.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>
>>>>
>>
>>
--
Dr John Bland, Systems Administrator
Room 210, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 3396
|