Hi John,
Things seem to be going well with the SAM tests, but I don't seem to be
able to srmPing hepgrid5 on the SMR2.2 endpoint. Any ideas?
Cheers,
Greig
On 18/02/08 15:45, John Bland wrote:
> Hi,
>
> To follow myself up, we appear to have fixed the dual-homed problems and
> can copy files in and out of the SE from internal and external machines.
> Tests are starting to pass again (woohoo!).
>
> We're leaving it a while to see if any more problems were being masked
> by anything we've fixed. If we're clean we'll probably push ahead with
> migrating our pools and getting some space before attempting to break it
> all again with the SRM2.2 spacemanager ;0).
>
> John
>
> John Bland wrote:
>> Hi,
>>
>> We are making progress, of sorts.
>>
>> We have fixed the GIIS problem (the static-file-Site.ldif hadn't been
>> generated by yaim, usefully).
>>
>> I've also been naughty and set the srm1 endpoint as being srm_v1
>> rather than SRM. Not sure which of the above fixed things as they were
>> changed at the same time but then external SAM tests for SRM to
>> hepgrid5 started passing.
>>
>> This didn't change the CE-* ops tests and Steve Lloyd analysis tests
>> failing, which continued with the CGSI-gSOAP can't connect errors.
>>
>> We finally realised this morning that our new WN's were set to connect
>> to the internal 192.168 interface on the SE, which had been disabled
>> since then due to conflicts between the eth0 and eth1 addresses
>> causing dcache to fail.
>>
>> Adding the 192.168 address back to the SE stops the gSOAP errors but
>> we still haven't fixed the underlying problem with dcache on
>> dual-homed servers.
>>
>> We are trying to fix that as we don't want internal SE transfers
>> battering our firewall/router all the time if possible but it is
>> proving obstinate (par for the course it seems). We've set in
>> dCacheSetup srmCustomGetHostByAddr=true and followed the instructions
>> as on
>> http://www.gridpp.ac.uk/wiki/DCache_FAQ#How_do_use_dCache_with_dual_homed_machines.3F
>> but gridftp transfers just timeout after opening BINARY data
>> connection (but eg edg-gridftp-ls does list as expected).
>>
>> John
>>
>> Greig Alan Cowan wrote:
>>> Hi John,
>>>
>>> Is everything OK with your dCache? I don't seem to be able to srmPing
>>> it.
>>>
>>> Thanks,
>>> Greig
>>>
>>> On 15/02/08 13:54, John Bland wrote:
>>>> Greig Alan Cowan wrote:
>>>>> Hi John,
>>>>>> Really? ... Ah, if you're looking at steve lloyd's srm tests
>>>>>> they're still failing for hepgrid5, but are passing for segrid1
>>>>>> (which I fixed earlier today). Still see the gSOAP error for
>>>>>> ops/steve lloyd analysis tests.
>>>>>
>>>>> No, it's definitely working now:
>>>>>
>>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest/UKI-NORTHGRID-LIV-HEP_2.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The SE tests have been passing since we came online and sorted out a
>>>> dcache.kpwd file and permissions.
>>>>
>>>> What are failing are analysis jobs, such as
>>>>
>>>> http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest/UKI-NORTHGRID-LIV-HEP_MyAnalPackage_6.html
>>>>
>>>>
>>>> with the error
>>>>
>>>> httpg://hepgrid5.ph.liv.ac.uk:8443/srm/managerv1: CGSI-gSOAP: Could
>>>> not open connection !
>>>> lcg_cp: Communication error on send
>>>> Error in <TFile::TFile>: file aod.pool.root does not exist
>>>> Could not open the file "aod.pool.root"
>>>> Warning in <TClass::TClass>: no dictionary for class IProxyDict is
>>>> available
>>>> WARNING: $POOL_CATALOG is not defined
>>>> using default `xmlcatalog_file:PoolFileCatalog.xml'
>>>>
>>>> *** Break *** segmentation violation
>>>>
>>>> or ops SAM Replica Management tests, such as on
>>>>
>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=hepgrid2.ph.liv.ac.uk
>>>>
>>>>
>>>> although I can't pick out any specific errors as the SAM site seems
>>>> to be very stodgy today.
>>>>
>>>>>> I've done this but while some of the changes have shown up in the
>>>>>> bdii there still isn't an /srm/managerv2 entry. I've attached our
>>>>>> static-file-SE.ldif file.
>>>>>
>>>>> What about the dSE.ldif file? You need to make sure that it
>>>>> contains something like:
>>>>
>>>> [snip]
>>>>
>>>> I've updated that file as well and it's showing up managerv2 in our
>>>> site BDII now, maybe that might help things.
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>
>>
>
>
|