On 25 August 2011 00:58, Ewan MacMahon <[log in to unmask]> wrote:
>> -----Original Message-----
>> From: Alastair Dewhurst [mailto:[log in to unmask]]
>> Sent: 24 August 2011 15:16
>>
>> I know there has been some ongoing discussion about the lack of data being
>> placed on sites datadisk by ATLAS.
>>
> Indeed. From the discussions so far I feel confident in saying that
> we have at least one problem, possibly more. It's clearly a complicated
> matter though, and I think we're some way off having a good common
> understanding of the situation.
>
> For now I'd like to just pick out two bits of this email to focus on:
>
>> As I have already mentioned a few times, if you fail Hammer Cloud tests,
>> your site will be set broker-off and you won't get any data.
>>
> I don't understand the motivation for this. An idle resource, whether
> that's network bandwidth or empty disk space, is a wasted resource.
> I cannot see any advantage to ever deliberately wasting resources. It
> may be that a site has reliability problems today, but it still makes
> sense to start pushing data to them so that you can run jobs there
> tomorrow. There's no need to even try to decide whether it is a better
> bet to send data to site 'A' or to site 'B' when you can simply send
> it to both of them. The more sites have copies of the data, the more
> CPUs are available as candidates to subsequently run analysis work
> on it.
In addition to this, I've certainly seen the black-listing for
data/compute interaction cause problems for users (I know of several
people who've complained to me that if their jobs complete but the
site is blacklisted shortly after, they have trouble getting the data
themselves, even though DDM knows about it). Of course, that's in the
opposite direction of data-flow to the topic of this email, but..
>
>> On 24 Aug 2011, at 14:04, Alexei Klimentov wrote:
>>
>>
>> I forwarded your e-mail to ATLAS CREM chair. Borut agreed to discuss
>> it and if CREM will approve my proposal then T2s will have 1-2
>> planned AOD copies.
>>
> Does this refer to a partial return to the old planned data placement
> model, rather than PD2P driven placement? If so, I'm not sure, based
> on my understanding of the data usage patterns, that that sounds like
> a good idea. My understanding, based essentially on Graeme's GridPP 26
> talk, is that a lot of data that was placed in a planned fashion was
> never used. PD2P seem to be a much more sensible system for a model in
> which the Tier 2 disk is primarily cache, not storage.
>
> As far as I can see the key problem is that PD2P is not very aggressive
> about getting new, hot, interesting datasets replicated. I'd have thought
> it would make sense to get copies of those 'hot' datasets on to as many
> Tier 2 disks as possible, as quickly as possible, so as to make as
> many CPUs available for analysis as possible.
Speed, in particular is of the essence, from what I recall of the rate
of "cooling" of datasets as measured by ATLAS.
Sam
>
> Ewan
>
|