On Tue, 11 Mar 2003, Stephen Burke wrote:
> On Thu, 6 Mar 2003, Lawrence Lowe wrote:
> > Hi all, we at Birmingham have still got an amber dot on the
> > new map, even though a lot of things look right (to me).
>
> I think there's a more general point here. What we really need is
> some monitoring which not only tells you that a site is not working,
> but tells you why! Detailed testbed monitoring is something which is
> sorely needed at the moment; there are a fair number of web pages
> which give some kind of view of the system, but nothing which really
> enables problems to be diagnosed.
Yes, although that's quite hard to do for job submission problems. Once
a site gets accessible via the RB's, its easier to put a series of, say,
SE tests into a script and get diagnostic output. ie its easier to query
an SE about a file when you can actually run a job at that site, than to
query the pool account lock files when you can't even get in to run a job.
So I think while we're still getting basic job submission working for all
sites, the current way (me mailing sites back and forth, one by one and
suggesting things to try) is the only way.
(Good news today is that Glasgow and Edinburgh are now working for Globus
with the new map.)
Cheers,
Andrew
|