Hi,
To follow myself up, we appear to have fixed the dual-homed problems and
can copy files in and out of the SE from internal and external machines.
Tests are starting to pass again (woohoo!).
We're leaving it a while to see if any more problems were being masked
by anything we've fixed. If we're clean we'll probably push ahead with
migrating our pools and getting some space before attempting to break it
all again with the SRM2.2 spacemanager ;0).
John
John Bland wrote:
> Hi,
>
> We are making progress, of sorts.
>
> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been
> generated by yaim, usefully).
>
> I've also been naughty and set the srm1 endpoint as being srm_v1 rather
> than SRM. Not sure which of the above fixed things as they were changed
> at the same time but then external SAM tests for SRM to hepgrid5 started
> passing.
>
> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
> failing, which continued with the CGSI-gSOAP can't connect errors.
>
> We finally realised this morning that our new WN's were set to connect
> to the internal 192.168 interface on the SE, which had been disabled
> since then due to conflicts between the eth0 and eth1 addresses causing
> dcache to fail.
>
> Adding the 192.168 address back to the SE stops the gSOAP errors but we
> still haven't fixed the underlying problem with dcache on dual-homed
> servers.
>
> We are trying to fix that as we don't want internal SE transfers
> battering our firewall/router all the time if possible but it is proving
> obstinate (par for the course it seems). We've set in dCacheSetup
> srmCustomGetHostByAddr=true and followed the instructions as on
> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
> but gridftp transfers just timeout after opening BINARY data connection
> (but eg edg-gridftp-ls does list as expected).
>
> John
>
> Greig Alan Cowan wrote:
>> Hi John,
>>
>> Is everything OK with your dCache? I don't seem to be able to srmPing it.
>>
>> Thanks,
>> Greig
>>
>> On 15/02/08 13:54, John Bland wrote:
>>> Greig Alan Cowan wrote:
>>>> Hi John,
>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
>>>>> they're still failing for hepgrid5, but are passing for segrid1
>>>>> (which I fixed earlier today). Still see the gSOAP error for
>>>>> ops/steve lloyd analysis tests.
>>>>
>>>> No, it's definitely working now:
>>>>
>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
>>>
>>>
>>>
>>>
>>> The SE tests have been passing since we came online and sorted out a
>>> dcache.kpwd file and permissions.
>>>
>>> What are failing are analysis jobs, such as
>>>
>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
>>>
>>>
>>> with the error
>>>
>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could
>>> not open connection !
>>> lcg_cp: Communication error on send
>>> Error in <TFile::TFile>: file aod.pool.root does not exist
>>> Could not open the file "aod.pool.root"
>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
>>> available
>>> WARNING: $POOL_CATALOG is not defined
>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>>
>>> *** Break *** segmentation violation
>>>
>>> or ops SAM Replica Management tests, such as on
>>>
>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
>>>
>>>
>>> although I can't pick out any specific errors as the SAM site seems
>>> to be very stodgy today.
>>>
>>>>> I've done this but while some of the changes have shown up in the
>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our
>>>>> static-file-SE.ldif file.
>>>>
>>>> What about the dSE.ldif file? You need to make sure that it contains
>>>> something like:
>>>
>>> [snip]
>>>
>>> I've updated that file as well and it's showing up managerv2 in our
>>> site BDII now, maybe that might help things.
>>>
>>> Thanks,
>>>
>>> John
>>>
>
>
--
Dr John Bland, Systems Administrator
Room 210, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 3396
|