Hi Alessandra et al,
These tests have been a little unreliable in the past but I think Chris's explanation of how they represent jobs from smaller VOs points out that they can provide useful data.
I am curious as to why some sites attract more of these jobs than others.
RALPP, Manchester, Oxford account for 85% of the jobs.
Is this because we have more ce's? (Oxford has 3), Ralpp has 3, Manchester has 3 but only 2 in production.
I then wondered why although Oxford and RALPP are doing well (97%) success, why do we fail some times.
All the errors from our sites are down to a failed lcg-cp which Kashif believes is down to a time out from the top level bdii at RAL. Chris W has already opened a ticket about this.
However the errors at Manchester seem to be down to a CVMFS issue on wn2206180 as they error message is
Trying to source:
/cvmfs/atlas.cern.ch/repo/sw/software/x86_64-slc5-gcc43-opt/17.6.0/cmtsite/asetup.sh AtlasOffline 17.6.0
Failed to find asetup.sh
Alessandra, can you check if this is the case, if so your score would probably go to 100%. The question is why don't you see the lcg-cp error. Are you using your own top bdii?
Thanks Pete
--
----------------------------------------------------------------------
Peter Gronbech GridPP Project Manager Tel No. : 01865 273389
Fax No. : 01865 273418
Department of Particle Physics,
University of Oxford,
Keble Road, Oxford OX1 3RH, UK E-mail : [log in to unmask]
----------------------------------------------------------------------
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Alessandra Forti
> Sent: 22 January 2013 08:27
> To: [log in to unmask]
> Subject: Re: "Average SLL untargeted ATLAS test performance (UK test)" in
> Quarter reports
>
> Hi Jeremy,
>
> all the nagios tests are submitted with WMS. I used the atlas nagios
> tests for a year in the quarterly reports because [1] wasn't working for
> two quarters a year ago. The situation has now inverted.
>
> If [1] is not a good metric it should be removed or replaced with other
> tests.
>
> cheers
> alessandra
>
> [1] http://pprc.qmul.ac.uk/~lloyd/gridpp/uktest.html
>
> On 21/01/2013 22:32, Jeremy Coles wrote:
> > Hi Elena,
> >
> > As Chris mentions it records how a user job submitted via the WMS fairs
> across the UK sites. I think the figure you quote is based on
> http://pprc.qmul.ac.uk/~lloyd/gridpp/uktest.html. So Sheffield may be 82%
> not 87% which is Lancaster (or I may have the wrong table!). We agreed a
> while ago that the number was not a useful metric upon which to measure
> site performance (so it was greyed out in the T2 quarterly reports) because if
> your site is busy with other work the figure will drop as jobs that do get
> queued might not complete successfully in a given time. However it is still
> useful for us to have a view on how such general user jobs distribute across
> sites and their success or failure so the collection of the data continues. This
> data is not used in the accounting algorithm!
> >
> > Jeremy
> >
> >
> >
> > On 21 Jan 2013, at 18:04, Christopher J. Walker wrote:
> >
> >> On 21/01/13 17:24, Elena Korolkova wrote:
> >>> Hello
> >>>
> >>> Sheffield has availability above 95% in nagios tests and atlas analysis for
> every month during this 2012 (and in 2010 and 2011 as well).
> >>> In quarter reports there is a parameter "Average SLL untargeted ATLAS
> test performance (UK test)" which is 87% for Sheffield for the last Quarter.
> >>> I'm wondering what does this parameter reflect?
> >>>
> >> It tests how well WMS jobs run at the site. To do this, it submits jobs
> >> to the WMS targeting the UK, but not any particular site. Jobs that the
> >> WMS brokers to a site will then end up in this statistic.
> >>
> >> Whilst ATLAS now have other mechanisms, this should reflect how other
> >> VOs that rely on the WMS see your site if jobs get brokered there.
> >>
> >> To get a VO perception, one needs to convolute that with number of jobs
> >> that actually get brokered to a site, but none the less poor performance
> >> here is probably one of the reasons t2k.org are suffering|at the moment.
> >>
> >>> AFAIR there was already discussion on this topic but "Average SLL
> >>> untargeted ATLAS test performance (UK test)" is still in Quarter
> >>> reports.
> >> Wearing my "other VOs" hat, I think that whilst atlas might not be the
> >> right VO, it's probably as good as any other - and until we recommend
> >> another method to small VOs, we should keep this metric.
> >>
> >> Wearing my QMUL hat, I can only apologise - we need to get better - the
> >> rather rushed EMI transition probably caused some of the problems, and
> >> took our eye off the others.
> >>
> >> Chris
>
>
> --
> Facts aren't facts if they come from the wrong people. (Paul Krugman)
|