On 17 February 2010 11:44, John Bland <[log in to unmask]> wrote:
> On 17/02/2010 11:32, Alessandra Forti wrote:
>>
>> Hi John,
>>>
>>> Big files may make data management more efficient but what's the point
>>> in efficiently managing data you can't efficiently analyse (what we're
>>> ultimately here to do)?
>>
>> data management is the back bone of all this. A site can have the
>> fastest local access in the world but if the data doesn't arrive or
>> arrives incomplete it's not much of a use.
>
> Obviously there has to be a compromise between the two requirements,
> otherwise we'd just have one stonking big file.
>
> Is the current file size such a compromise or has it been decided on purely
> from a data management point of view?
>
>>> If LHC data can be ordered such that it can be read *linearly* from
>>> the file all of the above becomes unnecessary.
>> isn't this what reordering is supposed to do? or have I missed something?
>
> That's what I thought it was supposed to do. If it has done so then I'm
> puzzled as to why it doesn't translate to a similar increase in efficiency
> when using RFIO. Our local tests show near enough the same efficiency when
> accessing unordered or ordered files over RFIO but we haven't tested many
> datasets yet.
>
Which is why I asked Wahid about the amount of data sent vs requested
for the various methods, and others asked about rfio configuration
settings. It's quite possible that this is an rfio tuning issue (or
just a network connectivity one, given what I know of ECDF's network
topology).
Sam
> John
>
> --
> John Bland [log in to unmask]
> System Administrator office: 220
> High Energy Physics Division tel (int): 42911
> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
> University of Liverpool http://www.liv.ac.uk/physics/hep/
> "I canna change the laws of physics, Captain!"
>
|