Greig Alan Cowan wrote: > Hi John, > > Things seem to be going well with the SAM tests, but I don't seem to be > able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas? dCacheSetup still had srmVersion=1. I've set this to default (ie commented it out) and restarted dcache. Hopefully that was the problem and it won't break anything. John > Cheers, > Greig > > On 18/02/08 15:45, John Bland wrote: >> Hi, >> >> To follow myself up, we appear to have fixed the dual-homed problems >> and can copy files in and out of the SE from internal and external >> machines. Tests are starting to pass again (woohoo!). >> >> We're leaving it a while to see if any more problems were being masked >> by anything we've fixed. If we're clean we'll probably push ahead with >> migrating our pools and getting some space before attempting to break >> it all again with the SRM2.2 spacemanager ;0). >> >> John >> >> John Bland wrote: >>> Hi, >>> >>> We are making progress, of sorts. >>> >>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been >>> generated by yaim, usefully). >>> >>> I've also been naughty and set the srm1 endpoint as being srm_v1 >>> rather than SRM. Not sure which of the above fixed things as they >>> were changed at the same time but then external SAM tests for SRM to >>> hepgrid5 started passing. >>> >>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests >>> failing, which continued with the CGSI-gSOAP can't connect errors. >>> >>> We finally realised this morning that our new WN's were set to >>> connect to the internal 192.168 interface on the SE, which had been >>> disabled since then due to conflicts between the eth0 and eth1 >>> addresses causing dcache to fail. >>> >>> Adding the 192.168 address back to the SE stops the gSOAP errors but >>> we still haven't fixed the underlying problem with dcache on >>> dual-homed servers. >>> >>> We are trying to fix that as we don't want internal SE transfers >>> battering our firewall/router all the time if possible but it is >>> proving obstinate (par for the course it seems). We've set in >>> dCacheSetup srmCustomGetHostByAddr=true and followed the instructions >>> as on >>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F >>> but gridftp transfers just timeout after opening BINARY data >>> connection (but eg edg-gridftp-ls does list as expected). >>> >>> John >>> >>> Greig Alan Cowan wrote: >>>> Hi John, >>>> >>>> Is everything OK with your dCache? I don't seem to be able to >>>> srmPing it. >>>> >>>> Thanks, >>>> Greig >>>> >>>> On 15/02/08 13:54, John Bland wrote: >>>>> Greig Alan Cowan wrote: >>>>>> Hi John, >>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests >>>>>>> they're still failing for hepgrid5, but are passing for segrid1 >>>>>>> (which I fixed earlier today). Still see the gSOAP error for >>>>>>> ops/steve lloyd analysis tests. >>>>>> >>>>>> No, it's definitely working now: >>>>>> >>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> The SE tests have been passing since we came online and sorted out >>>>> a dcache.kpwd file and permissions. >>>>> >>>>> What are failing are analysis jobs, such as >>>>> >>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html >>>>> >>>>> >>>>> with the error >>>>> >>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could >>>>> not open connection ! >>>>> lcg_cp: Communication error on send >>>>> Error in <TFile::TFile>: file aod.pool.root does not exist >>>>> Could not open the file "aod.pool.root" >>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is >>>>> available >>>>> WARNING: $POOL_CATALOG is not defined >>>>> using default `xmlcatalog_file:PoolFileCatalog.xml' >>>>> >>>>> *** Break *** segmentation violation >>>>> >>>>> or ops SAM Replica Management tests, such as on >>>>> >>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk >>>>> >>>>> >>>>> although I can't pick out any specific errors as the SAM site seems >>>>> to be very stodgy today. >>>>> >>>>>>> I've done this but while some of the changes have shown up in the >>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our >>>>>>> static-file-SE.ldif file. >>>>>> >>>>>> What about the dSE.ldif file? You need to make sure that it >>>>>> contains something like: >>>>> >>>>> [snip] >>>>> >>>>> I've updated that file as well and it's showing up managerv2 in our >>>>> site BDII now, maybe that might help things. >>>>> >>>>> Thanks, >>>>> >>>>> John >>>>> >>> >>> >> >> -- Dr John Bland, Systems Administrator Room 210, Oliver Lodge Particle Physics Group, University of Liverpool Mail: [log in to unmask] Tel : 0151 794 3396