Print

Print


Hi *,

Fotis, I agree with you that we have a ways to go with support, but I
disagree with you about the utility of the SFT.  Some time ago, the only
real diagnostics we had were a) Stephen Burke's monitoring activities
and b) failing jobs.  Often several hundred jobs went down the tube
before anyone noticed something was wrong.  At first people contacted
the site, and waited for sometimes days, having to put handcrafted
requirements statements in their JDLs to exclude bad sites.  When people
got tired of this, experiments started to run their own BDIIs and
exclude sites sometimes without even telling the site they had been
excluded.

Now at least we have a couple of places we can look to see what people
think is happening at our sites.  I too have been bitten by a faulty SFT
report, but on the other hand I was able to use the SFT to discover
several bugs in our prototype ELC site a couple months ago, so we were
able to come into production with an almost bug-free site.  The
occasional erroneous removal of my site is a minor annoyance compared
with the benefit for the system as a whole and the time of many users
and EIS support people that would be lost if we didn't have the SFT.

What I normally do in the morning is check the local queues when I come
in (we have a couple local pages to monitor farm occupancy).  If things
look active, I don't worry and go on to other work.  If things look very
slow or empty, I look around for an obvious local cause (these are
usually caught anyway by local monitoring scripts), then I head off to
take a look at the GIIS monitor and SFT reports.  There are very few
problems that escape this web, and if things are all OK, it only takes a
few minutes to look at.  Given that LCG has only been in production for
a bit over a year, and that we are still developing the system, I think
we are doing pretty well.

When the Grid has a 'paperclip' it will be time to quit and do something
more interesting!

One might even think you just volunteered to write an automatic
notification system based on the SFT results ;-)

        J "see y'all in Athens" T

Fotis Georgatos wrote:
> Hi Stephen,

> The way problem tracking is currently done is nearly humorous... read this:

> I find SFTs excellent as reference material for debugging, but not more.