Hi Steve,
Yes we know about that issue - good to see a proper explanation of it is available though.
Our small imbalance is due to a misconfiguration from the days we had just a triplet of systems - an old address is still in use pointing at the original three systems. This will be changed soon.
Martin.
--
Martin Bly
RAL Tier1 Fabric Manager
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Steve Traylen
> Sent: Tuesday, April 06, 2010 1:06 PM
> To: [log in to unmask]
> Subject: Re: lcg-bdii.gridpp.ac.uk problems
>
> On Mon, Apr 5, 2010 at 9:55 AM, Martin Bly <[log in to unmask]> wrote:
> > Tier1 Fabric and Tier1 Services are aware that the 5 top-level BDII servers
> at RAL are sporadically failing to respond with the required alacrity. The
> data shows that this is due in part to a local imbalance in requests to the
> servers but mostly the machines concerned are under-powered now that data
> taking has begun. Steps will be taken to remedy this issue during the week.
> >
> Just to check you are aware about a possible cause for the inbalence
> in requests.
>
> https://twiki.cern.ch/twiki/bin/view/LinuxSupport/GlibcDnsLoadBalancing
>
> working around that is operationally tricky. Putting the hosts in
> different subnets and/or sites
> is the best but hardly convenient. Some of lcg-bdii.cern.ch will be
> moving to different subnets
> shortly to try and help with this.
>
> > Martin.
> >
> > --
> > Martin Bly
> > RAL Tier1 Fabric Manager
> >
> >> -----Original Message-----
> >> From: Testbed Support for GridPP member institutes [mailto:TB-
> >> [log in to unmask]] On Behalf Of John Bland
> >> Sent: Monday, April 05, 2010 1:46 AM
> >> To: [log in to unmask]
> >> Subject: lcg-bdii.gridpp.ac.uk problems
> >>
> >> Hi,
> >>
> >> Liverpool have failed a number of SAM tests over the last day or so,
> >> which seem to have all resulted from missing entries in BDII or being
> >> unable to connect to lcg-bdii.gridpp.ac.uk.
> >>
> >> Other sites have had a number of failed tests over the same period.
> >> We've also seen a number of temporary blips in our SAM tests over the
> >> past few weeks, resulting from a lack of entries for our SE in BDII.
> >> We've been unsure if this is anything to do with our glite 3.2 BDII or
> >> a
> >> flaky VM system.
> >>
> >> As some of the recent tests have included lack of entries for external
> >> systems as well I'm wondering if the GridPP BDII server is experiencing
> >> problems for other sites as well and if other Tier2 sites without
> >> problems are using a different top BDII.
> >>
> >> Should we all be using the same BDII or is there something to be gained
> >> from a *coordinated* distribution of external services to mitigate the
> >> otherwise single point of failure that BDIIs can represent within the
> >> UK
> >> cloud?
> >>
> >> John
> >>
> >> --
> >> Dr John Bland, M.Phys.(Hons), Ph.D. (Liverpool)
> >> Email: [log in to unmask]
> >> Phone: 0151 256 7055, Mobile: 07794 935 213
> >> Web : http://www.third-bird.co.uk/photography/
> >> "Happy Happy Joy Joy Joy!" - Stimpy
> >
>
>
>
> --
> Steve Traylen
|