From Manchester:
....
SRMClientV2 : connecting to srm at
httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
SRMClientV2 : srmPing , contacting service
httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
Tue Feb 19 11:55:37 GMT 2008: received response
Tue Feb 19 11:55:37 GMT 2008: VersionInfo : v2.2
backend_type:dCache
backend_version:production-1-8-0-12p4
Sergey
On 19/02/2008, Greig Alan Cowan <[log in to unmask]> wrote:
> Hi John,
>
> It's definitely not working for me (see below). Certainly from your
> output it looks like it's working. As you say, all of the files look fine.
>
> I can ping the SRMv1 endpoint, it is only the v2.2 one that is complaining.
>
> Could someone else give this a go from outside Liverpool? You will need
> to use the latest dcache-srmclient rpm.
>
> Cheers,
> Greig
>
> $ opt/d-cache/srm/bin/srmping -2 -debug
> srm://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> WARNING: SRM_PATH is defined, which might cause a wrong version of srm
> client to be executed
> WARNING: SRM_PATH=/home/gcowan/opt/d-cache/srm
> Storage Resource Manager (SRM) CP Client version 2.0
> Tue Feb 19 11:30:58 GMT 2008: In SRMClient ExpectedName: host
> Tue Feb 19 11:30:58 GMT 2008: SRMClient(https,srm/managerv2,true)
> SRMClientV2 : user credentials are:
> /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan
> SRMClientV2 : WEBSERVICE_PATH srm/managerv2
> SRMClientV2 : connecting to srm at
> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> SRMClientV2 : srmPing , contacting service
> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> SRMClientV2 : srmPing: try # 0 failed with error
> AxisFault
> faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
> faultSubcode:
> faultString: java.rmi.RemoteException: SRMServerV2.srmPing()
> exception; nested exception is:
> java.lang.NoSuchMethodException:
> org.dcache.srm.v2_2.SrmPingResponse.setStatusCode(org.dcache.srm.v2_2.TStatusCode)
> faultActor:
> faultNode:
> faultDetail:
> {http://xml.apache.org/axis/}hostname:hepgrid5.ph.liv.ac.uk
>
> java.rmi.RemoteException: SRMServerV2.srmPing() exception; nested
> exception is:
> java.lang.NoSuchMethodException:
> org.dcache.srm.v2_2.SrmPingResponse.setStatusCode(org.dcache.srm.v2_2.TStatusCode)
> at
> org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
>
>
> On 19/02/08 11:29, John Bland wrote:
> > Hi,
> >
> >
> > First of all, when did you try to ping (dcache was still restarting when
> > I sent the last email)? Secondly I can ping the srm2 and srm1 endpoints
> > from a liverpool machine:
> >
> > Tue Feb 19 11:25:13 GMT 2008: In SRMClient ExpectedName: host
> > Tue Feb 19 11:25:13 GMT 2008: SRMClient(https,srm/managerv2,true)
> > SRMClientV2 : user credentials are:
> > /C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
> > SRMClientV2 : WEBSERVICE_PATH srm/managerv2
> > SRMClientV2 : connecting to srm at
> > httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> > SRMClientV2 : srmPing , contacting service
> > httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv2
> > Tue Feb 19 11:25:18 GMT 2008: received response
> > Tue Feb 19 11:25:18 GMT 2008: VersionInfo : v2.2
> > backend_type:dCache
> > backend_version:production-1-8-0-12p4
> >
> > Tue Feb 19 11:25:38 GMT 2008: In SRMClient ExpectedName: host
> > Tue Feb 19 11:25:38 GMT 2008: SRMClient(https,srm/managerv1,true)
> > SRMClientV1 : user credentials are:
> > /C=UK/O=eScience/OU=Liverpool/L=CSD/CN=john bland
> > SRMClientV1 : SRMClientV1 calling
> > org.globus.axis.util.Util.registerTransport()
> > SRMClientV1 : connecting to srm at
> > httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1
> > Tue Feb 19 11:25:40 GMT 2008: connected to server, obtaining proxy
> > Tue Feb 19 11:25:40 GMT 2008: got proxy of type class
> > org.dcache.srm.client.SRMClientV1
> > Tue Feb 19 11:25:40 GMT 2008: srm ping returned = true
> >
> > Looks like the two endpoints are available to Liverpool addresses. Could
> > you try again, please?
> >
> > For reference I've diffed your files and our current setup, the
> > differences boil down to:
> >
> > srm_setup.env
> > =============
> >
> >> SRM_WEBAPP_DIR=${DCACHE_HOME}/libexec/apache-tomcat-5.5.20/webapps/srm
> > 16d18
> > < SRM_WEBAPP_DIR=${DCACHE_HOME}/srm-webapp
> >
> > dCacheSetup
> > ===========
> >
> > < #useGPlazmaAuthorizationModule=false
> > < useGPlazmaAuthorizationModule=true
> > < #useGPlazmaAuthorizationCell=true
> > < useGPlazmaAuthorizationCell=false
> > ---
> >> useGPlazmaAuthorizationModule=false
> >> useGPlazmaAuthorizationCell=true
> > 211c207
> > < # performanceMarkerPeriod=180
> > ---
> >> performanceMarkerPeriod=10
> > < # srmSpaceManagerEnabled=no
> > ---
> >> srmSpaceManagerEnabled=yes
> >
> > < # srmImplicitSpaceManagerEnabled=yes
> > ---
> >> srmImplicitSpaceManagerEnabled=yes
> >
> > < #parallelStreams=10
> > ---
> >> parallelStreams=1
> >
> > < srmCustomGetHostByAddr=true
> > ---
> >> # srmCustomGetHostByAddr=false
> >
> > < # SpaceManagerDefaultRetentionPolicy=CUSTODIAL
> > ---
> >> SpaceManagerDefaultRetentionPolicy=REPLICA
> > 667c658
> > < # SpaceManagerDefaultAccessLatency=NEARLINE
> > ---
> >> SpaceManagerDefaultAccessLatency=ONLINE
> > 672c663
> > < # SpaceManagerReserveSpaceForNonSRMTransfers=false
> > ---
> >> SpaceManagerReserveSpaceForNonSRMTransfers=true
> > < #billingToDb=no
> > ---
> >> billingToDb=yes
> >
> > srm.batch is identical.
> >
> > The only real difference I can see is that spacemanager isn't activated,
> > but this wasn't activated originally when you could ping our srm2.2
> > endpoint.
> >
> > Regards,
> >
> > John
> >
> > Greig Alan Cowan wrote:
> >> Hi John,
> >>
> >> Still not fixed. It appears that dCache thinks the srm/managerv2
> >> endpoint can only speak SRMv1. Can you compare your files with these:
> >>
> >> http://www.ph.ed.ac.uk/~gcowan1/srm.batch
> >> http://www.ph.ed.ac.uk/~gcowan1/dCacheSetup
> >> http://www.ph.ed.ac.uk/~gcowan1/srm_setup.env
> >>
> >> Thanks,
> >> Greig
> >>
> >>
> >> On 19/02/08 10:01, John Bland wrote:
> >>> Greig Alan Cowan wrote:
> >>>> Hi John,
> >>>>
> >>>> Things seem to be going well with the SAM tests, but I don't seem to be
> >>>> able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas?
> >>> dCacheSetup still had srmVersion=1. I've set this to default (ie
> >>> commented it out) and restarted dcache. Hopefully that was the problem
> >>> and it won't break anything.
> >>>
> >>> John
> >>>
> >>>> Cheers,
> >>>> Greig
> >>>>
> >>>> On 18/02/08 15:45, John Bland wrote:
> >>>>> Hi,
> >>>>>
> >>>>> To follow myself up, we appear to have fixed the dual-homed problems
> >>>>> and can copy files in and out of the SE from internal and external
> >>>>> machines. Tests are starting to pass again (woohoo!).
> >>>>>
> >>>>> We're leaving it a while to see if any more problems were being masked
> >>>>> by anything we've fixed. If we're clean we'll probably push ahead with
> >>>>> migrating our pools and getting some space before attempting to break
> >>>>> it all again with the SRM2.2 spacemanager ;0).
> >>>>>
> >>>>> John
> >>>>>
> >>>>> John Bland wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> We are making progress, of sorts.
> >>>>>>
> >>>>>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been
> >>>>>> generated by yaim, usefully).
> >>>>>>
> >>>>>> I've also been naughty and set the srm1 endpoint as being srm_v1
> >>>>>> rather than SRM. Not sure which of the above fixed things as they
> >>>>>> were changed at the same time but then external SAM tests for SRM to
> >>>>>> hepgrid5 started passing.
> >>>>>>
> >>>>>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
> >>>>>> failing, which continued with the CGSI-gSOAP can't connect errors.
> >>>>>>
> >>>>>> We finally realised this morning that our new WN's were set to
> >>>>>> connect to the internal 192.168 interface on the SE, which had been
> >>>>>> disabled since then due to conflicts between the eth0 and eth1
> >>>>>> addresses causing dcache to fail.
> >>>>>>
> >>>>>> Adding the 192.168 address back to the SE stops the gSOAP errors but
> >>>>>> we still haven't fixed the underlying problem with dcache on
> >>>>>> dual-homed servers.
> >>>>>>
> >>>>>> We are trying to fix that as we don't want internal SE transfers
> >>>>>> battering our firewall/router all the time if possible but it is
> >>>>>> proving obstinate (par for the course it seems). We've set in
> >>>>>> dCacheSetup srmCustomGetHostByAddr=true and followed the instructions
> >>>>>> as on
> >>>>>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
> >>>>>>
> >>>>>> but gridftp transfers just timeout after opening BINARY data
> >>>>>> connection (but eg edg-gridftp-ls does list as expected).
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>>> Greig Alan Cowan wrote:
> >>>>>>> Hi John,
> >>>>>>>
> >>>>>>> Is everything OK with your dCache? I don't seem to be able to
> >>>>>>> srmPing it.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Greig
> >>>>>>>
> >>>>>>> On 15/02/08 13:54, John Bland wrote:
> >>>>>>>> Greig Alan Cowan wrote:
> >>>>>>>>> Hi John,
> >>>>>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
> >>>>>>>>>> they're still failing for hepgrid5, but are passing for segrid1
> >>>>>>>>>> (which I fixed earlier today). Still see the gSOAP error for
> >>>>>>>>>> ops/steve lloyd analysis tests.
> >>>>>>>>> No, it's definitely working now:
> >>>>>>>>>
> >>>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The SE tests have been passing since we came online and sorted out
> >>>>>>>> a dcache.kpwd file and permissions.
> >>>>>>>>
> >>>>>>>> What are failing are analysis jobs, such as
> >>>>>>>>
> >>>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> with the error
> >>>>>>>>
> >>>>>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could
> >>>>>>>> not open connection !
> >>>>>>>> lcg_cp: Communication error on send
> >>>>>>>> Error in <TFile::TFile>: file aod.pool.root does not exist
> >>>>>>>> Could not open the file "aod.pool.root"
> >>>>>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
> >>>>>>>> available
> >>>>>>>> WARNING: $POOL_CATALOG is not defined
> >>>>>>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
> >>>>>>>>
> >>>>>>>> *** Break *** segmentation violation
> >>>>>>>>
> >>>>>>>> or ops SAM Replica Management tests, such as on
> >>>>>>>>
> >>>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> although I can't pick out any specific errors as the SAM site seems
> >>>>>>>> to be very stodgy today.
> >>>>>>>>
> >>>>>>>>>> I've done this but while some of the changes have shown up in the
> >>>>>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our
> >>>>>>>>>> static-file-SE.ldif file.
> >>>>>>>>> What about the dSE.ldif file? You need to make sure that it
> >>>>>>>>> contains something like:
> >>>>>>>> [snip]
> >>>>>>>>
> >>>>>>>> I've updated that file as well and it's showing up managerv2 in our
> >>>>>>>> site BDII now, maybe that might help things.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> John
> >>>>>>>>
> >>>
> >
> >
>
--
--
Sergey Dolgobrodov
Department of Physics & Astronomy
University of Manchester
Manchester M13 9PL
Tel: +44 (0)161 6608472
Mobile: +44 (0)790 4587534
Skype: sergeygd
|