Monitoring evidence suggests that the hiccups were not localised to the Tier1 internal network - the cacti monitor shows the interruption periods but also shows continuous data through the interruptions. We will look into this more on Tuesday.
Martin.
--
Martin Bly
RAL Tier1 Fabric Manager
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of J Coles
> Sent: Monday, April 05, 2010 9:45 AM
> To: [log in to unmask]
> Subject: Re: lcg-bdii.gridpp.ac.uk problems
>
> All
>
> There have also been some short (approx. 1 hr) network interruptions
> at RAL on Sunday and this affected among other things the WMSes. If
> you receive user questions about job submission problems from Sunday
> morning or early evening this may have been the cause. If you noticed
> other or ongoing impacts please let me know.
>
> Jeremy
>
>
> On 5 Apr 2010, at 08:55, Martin Bly wrote:
>
> > Tier1 Fabric and Tier1 Services are aware that the 5 top-level BDII
> > servers at RAL are sporadically failing to respond with the required
> > alacrity. The data shows that this is due in part to a local
> > imbalance in requests to the servers but mostly the machines
> > concerned are under-powered now that data taking has begun. Steps
> > will be taken to remedy this issue during the week.
> >
> > Martin.
> >
> > --
> > Martin Bly
> > RAL Tier1 Fabric Manager
> >
> >> -----Original Message-----
> >> From: Testbed Support for GridPP member institutes [mailto:TB-
> >> [log in to unmask]] On Behalf Of John Bland
> >> Sent: Monday, April 05, 2010 1:46 AM
> >> To: [log in to unmask]
> >> Subject: lcg-bdii.gridpp.ac.uk problems
> >>
> >> Hi,
> >>
> >> Liverpool have failed a number of SAM tests over the last day or so,
> >> which seem to have all resulted from missing entries in BDII or
> being
> >> unable to connect to lcg-bdii.gridpp.ac.uk.
> >>
> >> Other sites have had a number of failed tests over the same period.
> >> We've also seen a number of temporary blips in our SAM tests over
> the
> >> past few weeks, resulting from a lack of entries for our SE in BDII.
> >> We've been unsure if this is anything to do with our glite 3.2 BDII
> >> or
> >> a
> >> flaky VM system.
> >>
> >> As some of the recent tests have included lack of entries for
> >> external
> >> systems as well I'm wondering if the GridPP BDII server is
> >> experiencing
> >> problems for other sites as well and if other Tier2 sites without
> >> problems are using a different top BDII.
> >>
> >> Should we all be using the same BDII or is there something to be
> >> gained
> >> from a *coordinated* distribution of external services to mitigate
> >> the
> >> otherwise single point of failure that BDIIs can represent within
> the
> >> UK
> >> cloud?
> >>
> >> John
> >>
> >> --
> >> Dr John Bland, M.Phys.(Hons), Ph.D. (Liverpool)
> >> Email: [log in to unmask]
> >> Phone: 0151 256 7055, Mobile: 07794 935 213
> >> Web : http://www.third-bird.co.uk/photography/
> >> "Happy Happy Joy Joy Joy!" - Stimpy
|