On Mon, Apr 5, 2010 at 2:46 AM, John Bland <[log in to unmask]> wrote:
> Hi,
>
> Liverpool have failed a number of SAM tests over the last day or so,
> which seem to have all resulted from missing entries in BDII or being
> unable to connect to lcg-bdii.gridpp.ac.uk.
>
> Other sites have had a number of failed tests over the same period.
> We've also seen a number of temporary blips in our SAM tests over the
> past few weeks, resulting from a lack of entries for our SE in BDII.
> We've been unsure if this is anything to do with our glite 3.2 BDII or a
> flaky VM system.
>
> As some of the recent tests have included lack of entries for external
> systems as well I'm wondering if the GridPP BDII server is experiencing
> problems for other sites as well and if other Tier2 sites without
> problems are using a different top BDII.
>
> Should we all be using the same BDII or is there something to be gained
> from a *coordinated* distribution of external services to mitigate the
> otherwise single point of failure that BDIIs can represent within the UK
> cloud?
There are couple of things to look at here:
You can use an A record in the DNS which has ip address at multiple
institutions. Some of the regions do this. The ldap libraries will
fail over nicely if they get a tcp reject, e.g because a site is
completely down.
Sites can also use a
LCG_GFAL_INFOSYS=bdii1.example.org,bdii2.example2.org
Though only in some cases. WNs and UIs should be okay in particular but
WMS can not use this.
Steve.
>
> John
>
> --
> Dr John Bland, M.Phys.(Hons), Ph.D. (Liverpool)
> Email: [log in to unmask]
> Phone: 0151 256 7055, Mobile: 07794 935 213
> Web : http://www.third-bird.co.uk/photography/
> "Happy Happy Joy Joy Joy!" - Stimpy
>
--
Steve Traylen
|