We have done ~20 hammercloud tests of various flavours in the UK in
the last 6 months, plus all the STEP tests. We had one 6 hour problem
with the pilot code two weeks ago. It was a bug and it was fixed. In
general panda infrastructure availability is probably > 99%, so we
were even a little unlucky to catch this. Please, we are not
"debug[ing] ATLAS pilot job framework".
In contrast, even better sites are having a hard time reaching 95% job
success rate, so from the site side we have a way to go yet.
My feeling is that the tests are useful for the sites - they are
primarily for the sites to prepare themselves for data. They do not
require people to be on "full alert" or to work out of normal hours.
They are to make analysis support routine, actually. Sites will
eventually reach a stage where they fully understand their
infrastructure, then you probably do not learn so much from the tests
but continuing to participate should not be a problem - in fact it
keeps re-validating the site, which is reassuring.
I haven't heard that people are finding the tests unhelpful or
distracting, but please say if you think so.
Cheers
Graeme
On Tue, Aug 4, 2009 at 11:26, Coles, J (Jeremy)<[log in to unmask]> wrote:
> Hi John
>
> We'll review the situation at today's deployment team meeting. It is a
> pity that the Panda server has not coped well in keeping things stable
> for the testing period, but there are periods of stability. The obvious
> danger is that many of the enthusiastic sysadmins out there lose
> interest if the ATLAS submission problems dominate.
>
> Any of the sysadmins out there reading this can comment if they feel
> their time is being wasted and the approach needs to change. Suggestions
> are always welcomed. Let's also see how things go today/tomorrow.
>
> Regards,
> Jeremy
>
>
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Gordon, JC (John)
> Sent: 04 August 2009 10:07
> To: [log in to unmask]
> Subject: Re: HammerCloud - today at 10am onwards.
>
> Just a thought. Always happy to be contradicted by those more engaged.
> From the last test, purely anecdotally, many of the issues raised on the
> list seemed to be due to the pilot factories etc. This seemed to be a
> waste of the alertness of the sysadmins.
>
> John
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Coles, J (Jeremy)
>> Sent: 04 August 2009 09:44
>> To: [log in to unmask]
>> Subject: Re: HammerCloud - today at 10am onwards.
>>
>> Hi John
>>
>> Others will probably reply but my understanding was that the
>> HammerCloud
>> jobs are just part of the testing infrastructure. The actual testing
> is
>> of the sites and their bottlenecks not the pilot job framework.
>> Therefore I think it is good if as many sites as possible remain
>> involved - it is more about finding capacity/throughput limits than
>> testing if a site is satisfactory and the subtle changes in the job
> mix
>> may bring out new issues each time.
>>
>> Jeremy
>>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]] On Behalf Of Gordon, JC (John)
>> Sent: 04 August 2009 09:32
>> To: [log in to unmask]
>> Subject: Re: HammerCloud - today at 10am onwards.
>>
>> Might it be an idea for sites which were satisfactory last time to
>> ignore it and then look back and see what happened? You're not going
> to
>> be on full alert at all hours during data taking and it shouldn't need
>> the whole UK to debug ATLAS pilot job framework.
>>
>> Just a thought,
>>
>> John
>>
>> > -----Original Message-----
>> > From: Testbed Support for GridPP member institutes [mailto:TB-
>> > [log in to unmask]] On Behalf Of Sam Skipsey
>> > Sent: 04 August 2009 09:15
>> > To: [log in to unmask]
>> > Subject: HammerCloud - today at 10am onwards.
>> >
>> > Hi all,
>> >
>> > Just to remind you that we have another HammerCloud at 10am.
>> >
>> > Business as usual. (Indeed, we can assume that these will be fairly
>> > regular for the next couple of weeks, I guess.)
>> >
>> > Sam
>> --
>> Scanned by iCritical.
>> --
>> Scanned by iCritical.
> --
> Scanned by iCritical.
> --
> Scanned by iCritical.
>
--
Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
Department of Physics and Astronomy, University of Glasgow, Scotland
DEATH TO MEETINGS!
|