Print

Print


Hi,

We are making progress, of sorts.

We have fixed the GIIS problem (the static-file-Site.ldif hadn't been 
generated by yaim, usefully).

I've also been naughty and set the srm1 endpoint as being srm_v1 rather 
than SRM. Not sure which of the above fixed things as they were changed 
at the same time but then external SAM tests for SRM to hepgrid5 started 
passing.

This didn't change the CE-* ops tests and Steve Lloyd analysis tests 
failing, which continued with the CGSI-gSOAP can't connect errors.

We finally realised this morning that our new WN's were set to connect 
to the internal 192.168 interface on the SE, which had been disabled 
since then due to conflicts between the eth0 and eth1 addresses causing 
dcache to fail.

Adding the 192.168 address back to the SE stops the gSOAP errors but we 
still haven't fixed the underlying problem with dcache on dual-homed 
servers.

We are trying to fix that as we don't want internal SE transfers 
battering our firewall/router all the time if possible but it is proving 
obstinate (par for the course it seems). We've set in dCacheSetup 
srmCustomGetHostByAddr=true and followed the instructions as on 
http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F 
but gridftp transfers just timeout after opening BINARY data connection 
(but eg edg-gridftp-ls does list as expected).

John

Greig Alan Cowan wrote:
> Hi John,
> 
> Is everything OK with your dCache? I don't seem to be able to srmPing it.
> 
> Thanks,
> Greig
> 
> On 15/02/08 13:54, John Bland wrote:
>> Greig Alan Cowan wrote:
>>> Hi John,
>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests they're 
>>>> still failing for hepgrid5, but are passing for segrid1 (which I 
>>>> fixed earlier today). Still see the gSOAP error for ops/steve lloyd 
>>>> analysis tests.
>>>
>>> No, it's definitely working now:
>>>
>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html 
>>
>>
>>
>> The SE tests have been passing since we came online and sorted out a 
>> dcache.kpwd file and permissions.
>>
>> What are failing are analysis jobs, such as
>>
>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html 
>>
>>
>> with the error
>>
>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could 
>> not open connection !
>> lcg_cp: Communication error on send
>> Error in <TFile::TFile>: file aod.pool.root does not exist
>> Could not open the file "aod.pool.root"
>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is 
>> available
>> WARNING: $POOL_CATALOG is not defined
>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>
>>  *** Break *** segmentation violation
>>
>> or ops SAM Replica Management tests, such as on
>>
>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk 
>>
>>
>> although I can't pick out any specific errors as the SAM site seems to 
>> be very stodgy today.
>>
>>>> I've done this but while some of the changes have shown up in the 
>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our 
>>>> static-file-SE.ldif file.
>>>
>>> What about the dSE.ldif file? You need to make sure that it contains 
>>> something like:
>>
>> [snip]
>>
>> I've updated that file as well and it's showing up managerv2 in our 
>> site BDII now, maybe that might help things.
>>
>> Thanks,
>>
>> John
>>


-- 
Dr John Bland, Systems Administrator
Room 210, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 3396