Burke, S (Stephen) wrote:
> Testbed Support for GridPP member institutes
>>[mailto:[log in to unmask]] On Behalf Of Graeme Stewart said:
>>Thanks for the tip! I'm actually a bit confused as to the point of
>>CLOSE_SE for reading files - other than that reading will be
>>assumed to be reasonably fast because of the "close" property.
> That is basically the point, it goes with the idea that you should move
> jobs to where the files are. Like many things it's only ever been
> defined in a vague way, the EDG concept was supposed to be to introduce
> dynamic network monitoring to optimise file access but it never really
> happened, although a lot of the pieces were there.
>>If a job gets passed
>>a SURL of whatever type, sfn:// or srm:// then is must have to use
>>gridftp or srmGet to actually obtain the file for its use.
> I'm not quite sure what point you're making.
Possible only my ignorance of how job submission works in detail...
> The idea is that a job
> specifies its input files as LFNs, and the broker tries to send it to a
> site where as many files as possible are "close". The job then reads the
> file by whatever protocol it wants (in practice it generally does a
> gridftp copy to the local disk on the WN).
Which it does by resolving an LFN to a SURL on a CLOSE_SE, right? My
point is really that CLOSE_SE is a very crude way to do this. In
practice Edinburgh is probably "close enough" to Glasgow that jobs can
get files from Edinburgh's SE quite quickly enough, and can be scheduled
to run here when the queue's short, where as from Moscow things are
likely to be very slow indeed. However I suppose it's unlikely that any
of the netmon EDG components will actually get deployed now, so we're
probabaly stuck with this.
>>I've heard tale of "CLOSE" meaning rfio or nfs access (reading the
>>recent thread on LCG-ROLLOUT), but in an SRM this would seem to make
>>little sense - you have to go through the SRM protocol to convert the
>>SURL into a TURL.
> Again, I'm not sure what point you're making. Each SE publishes the
> protocols it supports
That was the bit I was unclear on...
, and the JDL specifies which prototocol(s) the job
> wants to use the read the files. The broker is supposed to match the two
> - although in practice I'm not sure it respects the semantics properly,
> e.g. if the job only asks for rfio the job should be rejected if it
> isn't possible to find a site where *all* the input files are local, and
> I don't think it does (but I might be misremembering). When the job
> runs, it calls the replica manager getTURL function, which constructs
> the TURL for a classic SE or uses the SRM protocol for an SRM. An SRM
> can give an rfio TURL to any job, but it will only work for jobs on WNs
> at the same site.
If rfio is firewalled off. DPM's rifo is gsi-enabled, so it could, in
theory, be used as an alternative to gridftp. Although I think the SRM
daemon only passes out gsiftp TURLs right now.
>>I'm currently trawling through the experiment's latest TDR's
>>to try and
>>find out what the idea is here. This will become a real issue
>>as a DPM
>>volatile storage area will start its garbage collector to delete the
>>oldest unpinned files once the usage goes over some defined threshold.
> I doubt that anyone has thought much about it.
Well, the experiments did request volatile storage in the basaline
services report, so this is an issue they'll have to address. I'm making
enquires of the Atlas folks now.
> Personally I think we should try to tackle things like this
> systematically and do them properly,
I sense that it is believed that it's far to late to do things properly
Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
GridPP DM Wiki http://www.physics.gla.ac.uk/gridpp/datamanagement/