Hi John,
From France, we also noticed that most of sft-lcg-* failures appears
and then disappears without any action by the sites. This problem has
already been raised at various weekly operations meetings, and I think
that Sven was the last one to complain of this two weeks ago.
According to me, there are 2 problems to solve here:
- Make clearer the SFT failure reason, and in particular, point at the
real source of the failure rather than systematically imply that failure
comes from site. It would certainly require to make clearer the
middleware errors handling.
- Improve the robustness of the lcg-* commands... of course.
Pierre
Gordon, JC (John) a écrit :
>It's not just Irish sites. I see UK sites which fail one or two tests
>sft-lcg-rm then pass again without any action by the site. Our diagnosis
>is that it is failure to information service which causes this.
>
>Does no-one else see this?
>
>John
>
>
>
>>-----Original Message-----
>>From: LHC Computer Grid - Rollout
>>[mailto:[log in to unmask]] On Behalf Of
>>Maarten Litmaath
>>Sent: 28 November 2005 13:18
>>To: [log in to unmask]
>>Subject: Re: [LCG-ROLLOUT] sites Failing SFT lcg-rm tests
>>
>>Stephen Childs wrote:
>>
>>
>>
>>>Maarten Litmaath wrote:
>>>
>>> > Might your sites be suffering from the 15s query timeout
>>>
>>>
>>in lcg-utils?
>>
>>
>>> > How good is the connectivity to lcg-bdii.cern.ch?
>>> >
>>>Could you give me a sample ldapsearch string that is
>>>
>>>
>>representative of
>>
>>
>>>what the lcg-utils do to test this?
>>>
>>>
>>ldapsearch -x -h lcg-bdii.cern.ch:2170 -b o=grid \
>> '(&(GlueServiceType=*)(GlueServiceAccessControlRule=dteam))'
>>
>>
>>
>>>I just ran the following command:
>>>
>>>ldapsearch -x -h lcg-bdii.cern.ch -p 2170 -b
>>>'mds-vo-name=giAITie,mds-vo-name=local,o=grid'
>>>
>>>50 times from one of our slower sites and it seems as if there are
>>>occasions when it takes >1 minute to get this information.
>>>
>>>
>>(However a
>>
>>
>>>quick check at a couple of other sites didn't show such
>>>
>>>
>>long times.)
>>
>>
>>>If the problem is at the CERN end, it might explain why the RM
>>>failures happen intermittently?
>>>
>>>
>>I will have a look at lcg-bdii, but if the trouble is mostly
>>with the Irish sites, I suspect there is something clunky in
>>the old middleware or there is a connectivity problem.
>>
>>
>>
>
>
>
--
______________________
Pierre GIRARD
Grid Computing Team Member
IN2P3/CNRS Computing Centre - Lyon (FRANCE)
http://cc.in2p3.fr
Tel. +33 4.78.93.08.80 | Fax. +33 4.72.69.41.70 | e-mail: [log in to unmask]
|