Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart said:
> Thanks for the tip! I'm actually a bit confused as to the point of
> CLOSE_SE for reading files - other than that reading will be
> assumed to be reasonably fast because of the "close" property.
That is basically the point, it goes with the idea that you should move
jobs to where the files are. Like many things it's only ever been
defined in a vague way, the EDG concept was supposed to be to introduce
dynamic network monitoring to optimise file access but it never really
happened, although a lot of the pieces were there.
> If a job gets passed
> a SURL of whatever type, sfn:// or srm:// then is must have to use
> gridftp or srmGet to actually obtain the file for its use.
I'm not quite sure what point you're making. The idea is that a job
specifies its input files as LFNs, and the broker tries to send it to a
site where as many files as possible are "close". The job then reads the
file by whatever protocol it wants (in practice it generally does a
gridftp copy to the local disk on the WN).
> I've heard tale of "CLOSE" meaning rfio or nfs access (reading the
> recent thread on LCG-ROLLOUT), but in an SRM this would seem to make
> little sense - you have to go through the SRM protocol to convert the
> SURL into a TURL.
Again, I'm not sure what point you're making. Each SE publishes the
protocols it supports, and the JDL specifies which prototocol(s) the job
wants to use the read the files. The broker is supposed to match the two
- although in practice I'm not sure it respects the semantics properly,
e.g. if the job only asks for rfio the job should be rejected if it
isn't possible to find a site where *all* the input files are local, and
I don't think it does (but I might be misremembering). When the job
runs, it calls the replica manager getTURL function, which constructs
the TURL for a classic SE or uses the SRM protocol for an SRM. An SRM
can give an rfio TURL to any job, but it will only work for jobs on WNs
at the same site.
> So shouldn't a job be able to read _any_ file from an SE in
> this case?
> Or is there some resource broker criteria being involved
> here, so a job
> won't reach a site if its input files are not "close"?
The second - in fact the broker doesn't guarantee that all files are
close, but it tries to get as many files as possible to be close.
> I'm currently trawling through the experiment's latest TDR's
> to try and
> find out what the idea is here. This will become a real issue
> as a DPM
> volatile storage area will start its garbage collector to delete the
> oldest unpinned files once the usage goes over some defined threshold.
I doubt that anyone has thought much about it. Like many things this was
only ever half-designed, and parts of the design have decayed because
they assumed things which are no longer, or never were, true. The people
who invented the information system schema and the SRM protocol put in
placeholders for volatile files, but since there was no implementation
the middleware developers have never bothered with it, and it hasn't
been raised as an issue for experiments apart from people complaining to
atlas about SEs getting full.
Personally I think we should try to tackle things like this
systematically and do them properly, but what tends to happen is that
they get ignored until there's a crisis, at which point we get a quick
fix which in the long run causes as many problems as it solves ...
Stephen
|