On 5 July 2010 14:34, Alastair Dewhurst <[log in to unmask]> wrote:
> Hi
>
> Alessandra: These would be T1/2 -> T2 data transfers requested by users. It
> could in future be extended to T3 sites. Does that answer your query?
>
> Sam: There are two reasons, firstly users need to copy their data once it is
> produced (on scratchdisk) somewhere safer if they want to keep it.
> (scratchdisk deletes data automatically after 30 days) Putting it all
> together at their home institute seems reasonable.
Indeed, I know about the expiry time for scratchdisk - in fact, it
will probably be less than 30 days in future. I'm not convinced that
the correct "semi-archival" destination is always the home institute,
however. I suppose it depends on how likely they are to want to use
the data in future, and in what way.
> Secondly at some point
> they will need to run on their data locally to do fine tuning of their
> cuts/analysis and produce plots. You are right that if they have produced a
> large amount of data that they should use tools such as prun which will
> allow them ability to run root/other scripts on their data but at some point
> it will be much faster to do this locally on 1 machine.
Indeed. However, the trend in the wider world is for *more* remote
distribution of data, not more consolidation. It would be simpler to
provide one way of doing fine tuning - via prun, etc - that scales to
large datasets (which seem likely to be the increasing case) than two
methods, one of which is only usable locally and will break for users
who graduate to more data.
> Where we draw the
> line is a matter of debate but if a user can fit all their data on their
> laptop, some of them will try it. I think Roger Jones said in a talk that:
> Users are expect to download upto 10GB, if they download ~100GB a day their
> rate will be throttled and if they download ~1TB a day they will get
> throttled! So the case we are looking at here is the heavy user (possibly
> because of a group production role).
>
Right. And heavy users should, for the good of everyone else, do their
work on large datasets in a sensible, civilised manner - not by
forcing it all into one local area.
> I also think that for most people the total size of their files is quite
> small (~a few GB) however with real data they are producing alot of files
> even if they are only of the order of a few kb each. Its not always
> possible for the user to merge their files on the grid before downloading
> them and In this case a dq2-get command is still inefficient compared to a
> datri request.
Small file transfers are inefficient via lcg-cp anyway, if they're on
the order of kb per file. (But this is outside the scope of the
original topic.)
> If you check
> http://www.hep.lancs.ac.uk/~love/ukdata/token/ATLASLOCALGROUPDISK/
> its clear that there is currently alot of unused space.
>
> When I said:
> "The user would need access to the local mass storage"
> I was actually meaning to add, via the local transport protocol. While I
> assumed it was possible for this to be done, I thought it best to ask incase
> there was something I had overlooked.
>
No, I think that's fairly unobjectionable.
That said, I still think it would be *preferable* for the user to tune
their data using their local CE, even if all the data is local and
accessible via laptop etc.
Sam
>
> Stephen: With regards to the space token, this is something I am discussing
> with ADC. In the US, where they have many universities that aren't tier 2
> sites, they set up a single space token similar to our localgroupdisk which
> they can then copy whatever data they want too. The site rather than the US
> cloud is responsible for deleting stuff. Because alot of our universities
> are Tier 2s we could choose to use the existing scratchdisk or
> localgroupdisk if we wanted or set up a new space token. There are
> advantages and disadvantages in each but it mostly comes down to who is in
> charge of deleting stuff on the space token.
>
> Alastair
>
>
>
> On 5 Jul 2010, at 13:30, Sam Skipsey wrote:
>
>> On 5 July 2010 12:26, Alastair Dewhurst <[log in to unmask]> wrote:
>>>
>>> Hi
>>>
>>> The following is an idea in the early stage of development, thoughts
>>> would
>>> be welcome.
>>>
>>> Currently if ATLAS users wish to get data that has been produced on the
>>> grid
>>> they use dq2-get (this basically just looks up where ATLAS thinks the
>>> file
>>> is and does a lcg-cp command). This is fine if the user has a small
>>> amount
>>> of data and a small number of files. However it is becoming apparent
>>> that
>>> users are producing quite large sets of output files (~1000 files and
>>> ~100GB). This is only likely to get worse as we get more real data.
>>> While
>>> dq2-get is fine for small amounts of data it becomes rather slow and
>>> unreliable on this scale. It can take users days to download their
>>> datasets
>>> and even longer to check to make sure that all the files were downloaded
>>> correctly (and not duplicated!).
>>
>> Which is why the general grid data model is about moving compute to
>> data whenever possible, rather than the converse.
>> What, precisely, is the use case that *requires* users to move their
>> data all to one location, effectively de-gridifying it?
>> A lot of sites aren't going to be able to support many local users
>> doing this and transferring considerable amounts of stuff to their
>> localgroupdisk and scratchdisk tokens.
>>
>>>
>>> The suggestion is to tell users who want to get large amounts of user
>>> data
>>> to submit a datri (data transfer) request to copy the data to the tier 2
>>> site they actually work at. If the site was to allow it they could then
>>> access the files directly from the storage element. For example at
>>> RALPP.
>>> I could request my dataset be moved to scratchdisk and then once it was
>>> there access it by looking in:
>>> /pnfs/pp.rl.ac.uk/data/atlas/atlasscratchdisk/
>>> I would use local dcache protocal to copy it out.
>>>
>>> The datri request has the advantage that it can be schedule, should be
>>> more
>>> efficient than a dq2-get command, will automatically retry failures. Of
>>> course for this to work the user would need to be able to access the
>>> local
>>> mass storage and it would be understandable if site admins didn't want
>>> this.
>>
>> Well... they'll always have access to the local mass storage via the
>> GridFTP transport at least, anyway, surely? It's a Grid accessible
>> resource, and thus you can talk to it (at least) via anything that
>> speaks SRM...
>>
>> I suspect that what you mean is "access the local mass storage via its
>> local transport protocol"; this almost certainly shouldn't be a
>> significant problem for the reason Stephen mentioned already.
>>
>> I wonder if, in most cases, though, the problem would be more solved
>> in a more scalable manner by simply... distributing work over the
>> grid?
>>
>> Sam
>>
>>> However I am also aware that at some sites, local users already have
>>> limited access to the storage elements to give them somewhere to store
>>> their
>>> data offline.
>>>
>>> Comments?
>>>
>>> Alastair
>>>
>>>
>
>
|