Dear GridPP site administrators
I have been asked by the GridPP Project Management Board (PMB - meets
weekly as I'm sure you know) to report on the availability of GridPP
sites on a monthly basis starting in August.
As a precursor to my report due on Monday 6th August I have started
reviewing the SAM availability figures. In the same way that I provided
a view of each site's Steve Lloyd test performance against the GridPP
average, I have now uploaded results for individual site availability
against the GridPP average. You will find the graphs here:
http://www.gridpp.ac.uk/wiki/SAM_availability:_May-July_2007. We all
know that SAM still has problems and we can see evidence of this in the
average availability line. However, one purpose of this review is to
understand what level of availability is possible (looking across all
site results) and to focus on helping sites with relatively poor
availability.
The target availability figure for July is 85%. Based on data currently
available each site's name is either in green (> target) or red (<
target) depending on whether it has met this for the last month or not.
In addition I have extracted the normalised KSI2K CPU hrs for each site
from APEL (from the start of June until today) to give some idea of the
relative site contributions. In doing this I discovered that a number of
sites do not have a full APEL history (the KSI2K hrs figures appear in
red font for such sites).
These plots will drive some of the discussion at tomorrow's UKI monthly
deployment meeting:
http://indico.cern.ch/conferenceDisplay.py?confId=19090. There are a
number of good reasons why a site may have a lower availability than the
average - SRM instabilities after an upgrade, overloaded CE etc.
Therefore I invite you to add site comments to the wiki to explain
periods of poor availability and what was done to resolve any problem(s)
encountered*. PMB members will receive these explanations along with the
data.
Many thanks for your time,
Jeremy
*P.S I know that *some* sites offer good explanations each week in the
site reports. At the moment extracting this information is a very manual
task so I would appreciate the high-level explanations.
|