Thanks.
The problem with site optimisation of DPM buffers is that there is generally
no ideal optimisation. Using RFIO for ATLAS AOD files at the moment, even
with the new ordered files AFAIK, hits fundamental scaling limits on disk
IOPS or network bandwidth which kick in at very low numbers of jobs.
If individual users are going to be submitting any number of jobs via GANGA
and using RFIO then DPM sites are going to swamp their storage, which
impacts not just those users but also jobs running via pilots and
potentially other VOs.
Being able to specify file stager (or any other access method of the moment)
using GANGA for a site in ToA or some other central place would be very useful.
John
On 09/03/2010 10:26, Mark Slater wrote:
> Forwarding Johannes' mail due to a bounce!
>
> Thanks,
>
> Mark
>
>
> ---------- Forwarded message ----------
> Date: Tue, 09 Mar 2010 11:21:07 +0100
> From: Johannes Elmsheuser <[log in to unmask]>
> To: Mark Slater <[log in to unmask]>
> Cc: Peter Love <[log in to unmask]>, [log in to unmask],
> Daniel van der Ster <[log in to unmask]>
> Subject: Re: Taming ATLAS user jobs
>
> Hi,
>
> (adding Dan)
>
> this a site specific SE optimization problem for DPM - there is nothing
> from the Ganga side to optimize for DPM - the read-ahead buffers need to
> be tuned by the site - we need either to have a better Athena/POOL/EDM
> which prevents the back-and-forth reading (available in the latest
> athena releases) or switch to the copy-mode by using
> j.inputdata.type='FILE_STAGER' - we could switch on this option on
> demand on site-level.
>
> Cheers, Johannes
>
>
> On 03/09/2010 11:04 AM, Mark Slater wrote:
>> Hi John,
>>
>> I agree with Peter that these are almost certainly WMS jobs from Ganga.
>> However, I'm not sure why the protocol that's used is causing problems
>> as I thought Johannes Elmsheuser (GangaAtlas dev, CC'ed) fixed this in
>> recent releases. Johannes: could you comment on why jobs at Liverpool
>> would be causing these problems?? This is definitely your area rather
>> than mine!
>>
>> Thanks,
>>
>> Mark
>>
>>
>> On Tue, 9 Mar 2010, Peter Love wrote:
>>
>>> You're essentially talking about jobs via WMS, users are probably
>>> using ganga with LCG backend. Stop advertising that queue for atlas
>>> and that should do the trick. Pilots won't be affected.
>>>
>>> Peter
>>>
>>> On 9 March 2010 09:28, John Bland <[log in to unmask]> wrote:
>>>> Hi,
>>>>
>>>> We've been seeing a few ATLAS users submitting analysis jobs to our
>>>> site
>>>> directly, rather than via the pilot systems.
>>>>
>>>> The problem with is that the users may not necessarily know which
>>>> access
>>>> method to use at any particular site. The jobs we've seen have been
>>>> using
>>>> direct RFIO access and even a small number of jobs are either
>>>> saturating the
>>>> IOPS on the pools (small buffers) or saturating switches (big buffers).
>>>>
>>>> This is with only a relatively small number of jobs, a site full of
>>>> them
>>>> would simply grind to a complete halt for days and be an enormous
>>>> mess for
>>>> all concerned.
>>>>
>>>> Is there any way of getting around this other than ticketing every
>>>> single
>>>> user that does it (or fixing ATLAS AOD read patterns ;0)?
>>>>
>>>> John
>>>>
>>>> --
>>>> John Bland [log in to unmask]
>>>> System Administrator office: 220
>>>> High Energy Physics Division tel (int): 42911
>>>> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
>>>> University of Liverpool http://www.liv.ac.uk/physics/hep/
>>>> "I canna change the laws of physics, Captain!"
>>>>
>>>
>
>
--
John Bland [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42911
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
University of Liverpool http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"
|