Hi Jeremy
The plots are very interesting, but I suspect that sites will find it
very hard to comment on specific periods of less than 100% running
unless we get access to Steve's logs. Would this be possible? At the
moment we can only see about 10 hours worth.
I'm well aware of times when Steve's proxies were timing out at
Glasgow because we'd had ~2000 ATLAS jobs submitted by one user, and
we only do very weak fair-sharing among users within the ATLAS queue.
Clearly there's nothing wrong with the site in this case, in fact to
the user in question we looked fantastic, but the sharing of
resources within a VO at a site is really a problem for the VO, not
the site.
Finally, I see that there are clear periods of higher level failures,
where all sites drop to 0 (e.g., just before 10-02-2007). Probably a
better plot for sites to comment on would be a graph showing each
site's reliability _and_ the GridPP average together - then we can
concentrate on the periods when our sites fall below the average.
Cheers
Graeme
On 6 Mar 2007, at 22:41, Coles, J (Jeremy) wrote:
> Dear site admins
>
> You will probably be aware that for the last 2 months Steve Lloyd has
> been running a series of test jobs against GridPP sites. There are
> three
> types as explained here:
> http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php?action=info. You'll
> note that only the analysis job is ATLAS specific so the results are
> pretty indicative of a general user's view.
>
> The deployment team via the Tier-2 coordinators and storage group have
> been addressing problems uncovered with many of your sites.
> However, it
> seems that there are many reasons the tests may fail including when a
> tested site is full of ATLAS production jobs. Nevertheless one might
> expect to see a little more stability than is observed. Please take a
> look at this wiki page which now shows the historical results for each
> site since January: http://www.gridpp.ac.uk/wiki/SL_ATLAS_tests. The
> first plot is the combination of all sites and shows how chaotic the
> situation is to a user!
>
> Please feel free to edit the wiki with comments for your site
> explaining
> some of the variations if you can. We will revisit these results at
> the
> next UKI monthly support meeting on March 14th.
>
> Kind regards,
> Jeremy
--
Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
GridPP DM Wiki - http://wiki.gridpp.ac.uk/wiki/Data_Management
ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
|