Steve Traylen wrote:
> On Mon, Apr 5, 2010 at 9:55 AM, Martin Bly <[log in to unmask]> wrote:
>> Tier1 Fabric and Tier1 Services are aware that the 5 top-level BDII servers at RAL are sporadically failing to respond with the required alacrity. The data shows that this is due in part to a local imbalance in requests to the servers but mostly the machines concerned are under-powered now that data taking has begun. Steps will be taken to remedy this issue during the week.
>>
> Just to check you are aware about a possible cause for the inbalence
> in requests.
>
> https://twiki.cern.ch/twiki/bin/view/LinuxSupport/GlibcDnsLoadBalancing
>
Just for reference, in Debian, the matter was raised to the technical
committee in 2007 and they decided as follows:
:>
[log in to unmask]" target="_blank">http:[log in to unmask]
:>
:> Bug#438179: RFC3484 s6 rule 9 should not apply
:>
:[snip]>
:> The Technical Committee has decided as follows:
:>
:> 1. RFC3484 s6 rule 9 should not be applied to IPv4 addresses
:> by Debian systems, and we DO overrule the maintainer.
:> 2. RFC3484 s6 rule 9 should not be applied to IPv6 addresses
:> by Debian systems. We do NOT overrule the maintainer.
:> 3. We recommend to the IETF that RFC3484 s6 rule 9 should be
:> abolished for IPv4, and that it should be reconsidered for IPv6.
:>
:> The supermajority requirement for overruling the maintainer was met.
:>
> working around that is operationally tricky. Putting the hosts in
> different subnets and/or sites
> is the best but hardly convenient. Some of lcg-bdii.cern.ch will be
> moving to different subnets
> shortly to try and help with this.
I don't know how much of a problem this is - and it sounds like it isn't
the issue in the bdii case, but would it be worth considering putting a
similar patch in SL?
Chris
>
>> Martin.
>>
>> --
>> Martin Bly
>> RAL Tier1 Fabric Manager
>>
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>> [log in to unmask]] On Behalf Of John Bland
>>> Sent: Monday, April 05, 2010 1:46 AM
>>> To: [log in to unmask]
>>> Subject: lcg-bdii.gridpp.ac.uk problems
>>>
>>> Hi,
>>>
>>> Liverpool have failed a number of SAM tests over the last day or so,
>>> which seem to have all resulted from missing entries in BDII or being
>>> unable to connect to lcg-bdii.gridpp.ac.uk.
>>>
>>> Other sites have had a number of failed tests over the same period.
>>> We've also seen a number of temporary blips in our SAM tests over the
>>> past few weeks, resulting from a lack of entries for our SE in BDII.
>>> We've been unsure if this is anything to do with our glite 3.2 BDII or
>>> a
>>> flaky VM system.
>>>
>>> As some of the recent tests have included lack of entries for external
>>> systems as well I'm wondering if the GridPP BDII server is experiencing
>>> problems for other sites as well and if other Tier2 sites without
>>> problems are using a different top BDII.
>>>
>>> Should we all be using the same BDII or is there something to be gained
>>> from a *coordinated* distribution of external services to mitigate the
>>> otherwise single point of failure that BDIIs can represent within the
>>> UK
>>> cloud?
>>>
>>> John
>>>
>>> --
>>> Dr John Bland, M.Phys.(Hons), Ph.D. (Liverpool)
>>> Email: [log in to unmask]
>>> Phone: 0151 256 7055, Mobile: 07794 935 213
>>> Web : http://www.third-bird.co.uk/photography/
>>> "Happy Happy Joy Joy Joy!" - Stimpy
>
>
>
|