Print

Print


Greig Alan Cowan wrote:
> Hi John,
> 
> Things seem to be going well with the SAM tests, but I don't seem to be
> able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas?

dCacheSetup still had srmVersion=1. I've set this to default (ie
commented it out) and restarted dcache. Hopefully that was the problem
and it won't break anything.

John

> Cheers,
> Greig
> 
> On 18/02/08 15:45, John Bland wrote:
>> Hi,
>>
>> To follow myself up, we appear to have fixed the dual-homed problems
>> and can copy files in and out of the SE from internal and external
>> machines. Tests are starting to pass again (woohoo!).
>>
>> We're leaving it a while to see if any more problems were being masked
>> by anything we've fixed. If we're clean we'll probably push ahead with
>> migrating our pools and getting some space before attempting to break
>> it all again with the SRM2.2 spacemanager ;0).
>>
>> John
>>
>> John Bland wrote:
>>> Hi,
>>>
>>> We are making progress, of sorts.
>>>
>>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been
>>> generated by yaim, usefully).
>>>
>>> I've also been naughty and set the srm1 endpoint as being srm_v1
>>> rather than SRM. Not sure which of the above fixed things as they
>>> were changed at the same time but then external SAM tests for SRM to
>>> hepgrid5 started passing.
>>>
>>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
>>> failing, which continued with the CGSI-gSOAP can't connect errors.
>>>
>>> We finally realised this morning that our new WN's were set to
>>> connect to the internal 192.168 interface on the SE, which had been
>>> disabled since then due to conflicts between the eth0 and eth1
>>> addresses causing dcache to fail.
>>>
>>> Adding the 192.168 address back to the SE stops the gSOAP errors but
>>> we still haven't fixed the underlying problem with dcache on
>>> dual-homed servers.
>>>
>>> We are trying to fix that as we don't want internal SE transfers
>>> battering our firewall/router all the time if possible but it is
>>> proving obstinate (par for the course it seems). We've set in
>>> dCacheSetup srmCustomGetHostByAddr=true and followed the instructions
>>> as on
>>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
>>> but gridftp transfers just timeout after opening BINARY data
>>> connection (but eg edg-gridftp-ls does list as expected).
>>>
>>> John
>>>
>>> Greig Alan Cowan wrote:
>>>> Hi John,
>>>>
>>>> Is everything OK with your dCache? I don't seem to be able to
>>>> srmPing it.
>>>>
>>>> Thanks,
>>>> Greig
>>>>
>>>> On 15/02/08 13:54, John Bland wrote:
>>>>> Greig Alan Cowan wrote:
>>>>>> Hi John,
>>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
>>>>>>> they're still failing for hepgrid5, but are passing for segrid1
>>>>>>> (which I fixed earlier today). Still see the gSOAP error for
>>>>>>> ops/steve lloyd analysis tests.
>>>>>>
>>>>>> No, it's definitely working now:
>>>>>>
>>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The SE tests have been passing since we came online and sorted out
>>>>> a dcache.kpwd file and permissions.
>>>>>
>>>>> What are failing are analysis jobs, such as
>>>>>
>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
>>>>>
>>>>>
>>>>> with the error
>>>>>
>>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could
>>>>> not open connection !
>>>>> lcg_cp: Communication error on send
>>>>> Error in <TFile::TFile>: file aod.pool.root does not exist
>>>>> Could not open the file "aod.pool.root"
>>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
>>>>> available
>>>>> WARNING: $POOL_CATALOG is not defined
>>>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>>>>
>>>>>  *** Break *** segmentation violation
>>>>>
>>>>> or ops SAM Replica Management tests, such as on
>>>>>
>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
>>>>>
>>>>>
>>>>> although I can't pick out any specific errors as the SAM site seems
>>>>> to be very stodgy today.
>>>>>
>>>>>>> I've done this but while some of the changes have shown up in the
>>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our
>>>>>>> static-file-SE.ldif file.
>>>>>>
>>>>>> What about the dSE.ldif file? You need to make sure that it
>>>>>> contains something like:
>>>>>
>>>>> [snip]
>>>>>
>>>>> I've updated that file as well and it's showing up managerv2 in our
>>>>> site BDII now, maybe that might help things.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>
>>>
>>
>>


-- 
Dr John Bland, Systems Administrator
Room 210, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 3396