PS in the LFC obviously the TURL registered is the one of the replica
that was first copied to the site. However I don't think they are using
that because when I remove the pool the jobs are served the other replicas.
cheers
alessandra
Wahid Bhimji wrote:
> Also now did a test of local rfcp copying and this also gave me an
> equal distribution of the replicas.
>
> So I think that the static choice is either an "urban myth" or the
> jobs in question are doing something "other".
>
> I did confirm though that if I file is missing it just reports "no
> such file" rather than heading to the other replica (unless you try
> again a couple of times) .
> That is non-ideal behavior as you say (though most VO software
> would I think give it another shot or 2?)
>
> Wahid
>
>
> Wahid Bhimji wrote:
>> Well I just did a little test of trying lcg-cp for something that had
>> multiple replicas and it delivered me either one of them in roughly
>> equal proportions.
>>
>> So it seems is not static actually - at least for that kind of way of
>> requesting a file...
>>
>> I will test some more...
>>
>> Wahid
>>
>>
>> Sam Skipsey wrote:
>>> On 4 February 2010 13:55, Alessandra Forti
>>> <[log in to unmask]> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>>
>>>>> Yeah, Wahid and I noticed the second "feature" (static replica
>>>>> fetching)
>>>>> just recently - we're checking if it affects the sense of
>>>>> hotdisk file replication as an approach.
>>>>>
>>>> if the choice is static it does affect the sense of hotdisk. It
>>>> means DPM
>>>> serves the replica always from the same node with a number of
>>>> consequences
>>>>
>>>> 1) replicas are a waste of space because they never really get used
>>>> 2) the initial node can gets overloaded if too many jobs get the
>>>> files at
>>>> the same time
>>>> 3) if the pool node is not available jobs fail because there is no
>>>> failover
>>>>
>>>>
>>>
>>> Quite. We're not sure if the choice really is static, though, at the
>>> moment.
>>>
>>> Sam
>>>
>>>
>>>> cheers
>>>> alessandra
>>>>
>>>>
>>>> Sam Skipsey wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> The failover requirement is something I've already forwarded to
>>>>> Jean-Philippe, though.
>>>>> We have a significant chunk of suggestions all concerning "making
>>>>> replication good" now!
>>>>>
>>>>> Sam
>>>>>
>>>>> On 4 February 2010 11:10, Alessandra Forti <[log in to unmask]>
>>>>> wrote:
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have another request. There is no replica failover mechanism.
>>>>>> when there is a file with multiple replicas on different pool
>>>>>> nodes/fs
>>>>>> DPM
>>>>>> stops at the first one in its list even if the pool node/fs
>>>>>> cannot serve
>>>>>> the
>>>>>> file. It would be more efficient if it tried the other replicas
>>>>>> before
>>>>>> returning an error. The choice of the first pool doesn't seem random
>>>>>> either,
>>>>>> it looks DPM goes in order from the pool that got the file first
>>>>>> but I
>>>>>> have
>>>>>> to investigate this better.
>>>>>>
>>>>>> cheers
>>>>>> alessandra
>>>>>>
>>>>>>
>>>>>> Sam Skipsey wrote:
>>>>>>
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> As we discussed in the last storage meeting, the current DPM
>>>>>>> developers have told us that they're open to comments and
>>>>>>> suggestions
>>>>>>> about what to improve about DPM first.
>>>>>>>
>>>>>>> The items that they're planning to work on currently are:
>>>>>>>
>>>>>>> improved rfio performance (including improved replication and
>>>>>>> drain)
>>>>>>>
>>>>>>> limiting i/o per disk, filesystem
>>>>>>>
>>>>>>> longer term:
>>>>>>>
>>>>>>> quotas, accounting, possibly light-weight DPM, NFS 4.1 have all
>>>>>>> been
>>>>>>> mentioned.
>>>>>>>
>>>>>>>
>>>>>>> It would be useful if people replied to this thread with either an
>>>>>>> item from the above list that they particularly prize as high
>>>>>>> priority, AND/OR a new feature or bug fix not mentioned in this
>>>>>>> list
>>>>>>> that they think should be.
>>>>>>>
>>>>>>> I'll appropriately collate, bake and serve the resulting
>>>>>>> suggestion-cake to the developers once we have a reasonable
>>>>>>> number of
>>>>>>> replies.
>>>>>>>
>>>>>>>
>>>>>>> Sam
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> The most effective way to do it, is to do it. (Amelia Earhart)
>>>>>> Northgrid Tier2 Technical Coordinator
>>>>>> http://www.hep.manchester.ac.uk/computing/tier2
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> The most effective way to do it, is to do it. (Amelia Earhart)
>>>> Northgrid Tier2 Technical Coordinator
>>>> http://www.hep.manchester.ac.uk/computing/tier2
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
--
The most effective way to do it, is to do it. (Amelia Earhart)
Northgrid Tier2 Technical Coordinator
http://www.hep.manchester.ac.uk/computing/tier2
|