JISCMail - TB-SUPPORT Archives

Hi Alessandra

I thought a longer history was on Steve's site but as Graeme points out
(and you probably found yourself) it is only for the last 10
submissions. 

Some periods for sites might be explainable for reasons of the site
being in scheduled maintenance (the jobs don't check this fact yet) or
the site over used. 

There are some things that do need to be understood. 

Graeme

I'll try to think about other ways of presenting the individual site
data so that it can quickly be compared to say the average of all
results for a given period - that will definitely point to general
problem days. For now perhaps the RAL-PPD chart could be used as a
reference since it clearly shows the two main problems and remains
relatively stable at other times. As mentioned above though, we could do
with more history information. 

All

This is really just trying to give a new perspective on site
"availability" for a general user so please do look at the results
(http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php) often together with
SAM. Our objective is to try to get the overall pass rates up - even
today there is an obvious improvement. 

One other thing (already discussed by the deployment team on Tuesday),
please feel free to mail me your suggestions on how to improve the
GridView availability interface (shown here
http://gridview.cern.ch/GRIDVIEW/same_index.php) as I am speaking to the
developers tomorrow. Already on the list is getting rid of the
abbreviations and clearer explanations of the tests used. It looks
likely that figures from this tool will be among the first to be used by
the WLCG Management Board to judge site availability. For your site
select "Tier-2 site availability" and then select your site from the
(annoying) list, select the daily report and then the time frame.
Finally click display graphs. Let me know if you are unable to find your
site in the Tier-2 list - we already see some are missing. Also let me
know if the data looks completely wrong from your perspective (i.e. do
you think the trend is correct?).

Thanks,
Jeremy



> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Alessandra Forti
> Sent: 07 March 2007 09:55
> To: [log in to unmask]
> Subject: Re: What does your site look like to an average (ATLAS) user?
> 
> Hi Jeremy,
> 
> I don't think it is possible to comment on the history and the
> variations without more information.
> 
> cheers
> alessandra
> 
> Coles, J (Jeremy) wrote:
> > Dear site admins
> >
> > You will probably be aware that for the last 2 months Steve Lloyd
has
> > been running a series of test jobs against GridPP sites. There are
three
> > types as explained here:
> > http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php?action=info.
You'll
> > note that only the analysis job is ATLAS specific so the results are
> > pretty indicative of a general user's view.
> >
> > The deployment team via the Tier-2 coordinators and storage group
have
> > been addressing problems uncovered with many of your sites. However,
it
> > seems that there are many reasons the tests may fail including when
a
> > tested site is full of ATLAS production jobs. Nevertheless one might
> > expect to see a little more stability than is observed. Please take
a
> > look at this wiki page which now shows the historical results for
each
> > site since January: http://www.gridpp.ac.uk/wiki/SL_ATLAS_tests. The
> > first plot is the combination of all sites and shows how chaotic the
> > situation is to a user!
> >
> > Please feel free to edit the wiki with comments for your site
explaining
> > some of the variations if you can. We will revisit these results at
the
> > next UKI monthly support meeting on March 14th.
> >
> > Kind regards,
> > Jeremy
> 
> --
> Alessandra Forti
> NorthGrid Technical Coordinator
> University of Manchester