Dear All
We have reacted to the suggestions received so far and now:
1) Each of the site plots in the wiki
(http://www.gridpp.ac.uk/wiki/SL_ATLAS_tests#Results_11th_January_2007_t
o_5th_March_2007) has imposed over it (white line) the average results
from all sites. This helps to show periods when there was a common
problem as the average obviously dips for such periods. For generally
stable periods the average is less useful.
2) Steve has made available the logs for the full history of the tests
so that you can investigate more fully periods where your site did less
well (thanks Steve). Here is the link
http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php?action=alllogs.
So once again please could you take a look at your site performance and
where appropriate add comments to the wiki explaining events in your
site' history. Sharing this will help us determine the best way to get
all sites working better. So far only Bristol has risen to the challenge
- thanks Winnie. Ideally we should be able to understand each dip in the
plot for every site.
Thanks for your help,
Jeremy
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Coles, J (Jeremy)
> Sent: 07 March 2007 13:05
> To: [log in to unmask]
> Subject: Re: What does your site look like to an average (ATLAS) user?
>
> Hi Alessandra
>
> I thought a longer history was on Steve's site but as Graeme points
out
> (and you probably found yourself) it is only for the last 10
> submissions.
>
> Some periods for sites might be explainable for reasons of the site
> being in scheduled maintenance (the jobs don't check this fact yet) or
> the site over used.
>
> There are some things that do need to be understood.
>
> Graeme
>
> I'll try to think about other ways of presenting the individual site
> data so that it can quickly be compared to say the average of all
> results for a given period - that will definitely point to general
> problem days. For now perhaps the RAL-PPD chart could be used as a
> reference since it clearly shows the two main problems and remains
> relatively stable at other times. As mentioned above though, we could
do
> with more history information.
>
> All
>
> This is really just trying to give a new perspective on site
> "availability" for a general user so please do look at the results
> (http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php) often together
with
> SAM. Our objective is to try to get the overall pass rates up - even
> today there is an obvious improvement.
>
> One other thing (already discussed by the deployment team on Tuesday),
> please feel free to mail me your suggestions on how to improve the
> GridView availability interface (shown here
> http://gridview.cern.ch/GRIDVIEW/same_index.php) as I am speaking to
the
> developers tomorrow. Already on the list is getting rid of the
> abbreviations and clearer explanations of the tests used. It looks
> likely that figures from this tool will be among the first to be used
by
> the WLCG Management Board to judge site availability. For your site
> select "Tier-2 site availability" and then select your site from the
> (annoying) list, select the daily report and then the time frame.
> Finally click display graphs. Let me know if you are unable to find
your
> site in the Tier-2 list - we already see some are missing. Also let me
> know if the data looks completely wrong from your perspective (i.e. do
> you think the trend is correct?).
>
> Thanks,
> Jeremy
>
>
>
> > -----Original Message-----
> > From: Testbed Support for GridPP member institutes [mailto:TB-
> > [log in to unmask]] On Behalf Of Alessandra Forti
> > Sent: 07 March 2007 09:55
> > To: [log in to unmask]
> > Subject: Re: What does your site look like to an average (ATLAS)
user?
> >
> > Hi Jeremy,
> >
> > I don't think it is possible to comment on the history and the
> > variations without more information.
> >
> > cheers
> > alessandra
> >
> > Coles, J (Jeremy) wrote:
> > > Dear site admins
> > >
> > > You will probably be aware that for the last 2 months Steve Lloyd
> has
> > > been running a series of test jobs against GridPP sites. There are
> three
> > > types as explained here:
> > > http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php?action=info.
> You'll
> > > note that only the analysis job is ATLAS specific so the results
are
> > > pretty indicative of a general user's view.
> > >
> > > The deployment team via the Tier-2 coordinators and storage group
> have
> > > been addressing problems uncovered with many of your sites.
However,
> it
> > > seems that there are many reasons the tests may fail including
when
> a
> > > tested site is full of ATLAS production jobs. Nevertheless one
might
> > > expect to see a little more stability than is observed. Please
take
> a
> > > look at this wiki page which now shows the historical results for
> each
> > > site since January: http://www.gridpp.ac.uk/wiki/SL_ATLAS_tests.
The
> > > first plot is the combination of all sites and shows how chaotic
the
> > > situation is to a user!
> > >
> > > Please feel free to edit the wiki with comments for your site
> explaining
> > > some of the variations if you can. We will revisit these results
at
> the
> > > next UKI monthly support meeting on March 14th.
> > >
> > > Kind regards,
> > > Jeremy
> >
> > --
> > Alessandra Forti
> > NorthGrid Technical Coordinator
> > University of Manchester
|