On 29 Sep 2011, at 07:42, John Gordon wrote:
> I agree with the Mancunians. I don't know where you got the idea Stuart that GRAM worked with big clusters. Perhaps if the big cluster runs a single MPI job with another one or two in the queue but it was number of jobs that killed it.
"designed for fewer clusters" != "designed for many jobs", and defiatly not "good with many jobs". I never claimed it was able to _fill_ a large cluster, just that the user experience was designed around the idea of few accessible clusters.
But yeah, I should probably included the performance gripes too. And then there's all the cleverness of the GASS, which turned out to be rather overcomplicated for what we're actually doing, to the point of being probably more of a negative. It's instructive to note that the GASS paper only compares GASS to other posix appearing filesystems, rather than including a comparison to 'just copy the file in' model.
> I believe that the EMI strategy of not presenting a GRAM (or even a BES) interface is wrong as there are indeed many use cases for it, if were scalable. The LCG-CE (and other components) are now Globus Toolkit 4 Pre-Web Services version. Ie the latest version compatible with GT2.
GT5 has been out for approaching two years, longer if you include the beta's, and GRAM5 is directly compatible with the GT2 tools; so no, GT4 Pre-WS is not the latest version. Although the latest LCG-CE (3.1.40-0) was only out for a few months after the GT5 release, so LCG-CE uses the latest version that was available to it.
Nomencalture point, given that this all started with requests for info:
GRAM == Globus Gatekeeper (techincally, the Gatekeeper is the bit that talks to the outside world, whilst GRAM is the whole service, including a few other bits, but treat them as synonyms in the first instance)
GASS is the Globus file cache and staging system (more or less).
BES is the Open Grid Forum's standard, the Basic Execution Service. It's an attempt to have a reasonably standard base for how one would submit a job to a CE, and it's based on JSDL and web services.
EMI (read Blazas) note that the BES holds a model of computation that is unique to itself, and not the same as the GLUE model. Therefore the endpoints have to translate between the two, GLUE for meta-scheduling and reporting, and the BES one for job execution. It would be much easier, and less error prone, to write software that only used a single model, and as we already use GLUE, that's the one that should be used. There's also gripes about how you have to extend JSDL to actually describe a 'typical' job, and the JSDL extensions weren't included in the standard, so
They are planning a very BES like interface that does the above. That's not going to be compatible with BES ...
On GRAM, I believe EMI generally hold a, "Well, if you want GRAM, run GRAM!" philosophy, possibly with a wink and a nudge in the direction of IGE.
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Alessandra Forti
>> Sent: 29 September 2011 00:28
>> To: [log in to unmask]
>> Subject: Re: Early Adopters/Staged Rollout in the UK
>>
>> In the US they like to use it because its a US product and they get
>> funding for that.
>>
>> In HEP it is not liked because it needed a lot of hacks to make it work
>> on a big scale. If it hadn't been for CERN and the work they put to keep
>> going the globus based lcg-CE it wouldn't have lasted that long.
>>
>>
>> On 28/09/2011 19:11, Stuart Purdie wrote:
>>> Two, rather separate points:
>>>
>>> On 28 Sep 2011, at 14:43, Daniela Bauer wrote:
>>>
>>>> So far I had one volunteer (Raul) admitting to installing EMI software
>>>> and be willing to submit a report about it who wasn't on the list
>>>> before (to save you from clicking, here it is)
>>>>
>>>> UKI-SCOTGRID-GLASGOW CREAM EMI 1.0
>>> Also run a Developer Special EMI DPM headnode. I suspect that this is
>> the protoype EMI Preview case too. Most important detail: we're unlikely
>> to ever be official SR people for DPM up here.
>>>
>>> ... oh yeah, EMI Preview - think the Early Early Adopters. Eventually
>> we'll get to the point where the developers actually run the software
>> themselves, which might speed up the process of bug catching, but for the
>> moment the talk is of one stage closer to the developers...
>>>
>>> Seperatly:
>>>
>>>> Also EMI has come back to me, asking about IGE (globus) - apparently
>>>> the UK are avid globus users, but I have to admit, this is the first I
>>>> hear of it (IGE that is, somehow I was under the impression that
>>>> globus is integrated in various bits of middleware and that was that).
>>> Let me take a wander through the past, to give some background that (I
>> hope) will be useful in placing these things in context.
>>>
>>> Globus is 'officially' a Toolkit, which can be use to build production
>> infrastructure. It does, however, contain a number of components that can
>> be deployed as-is, although it makes a few assumptions about things if you
>> do that (e.g. no VOMS server support). Most of the current production grid
>> infrastructure is based on Globus Toolkit 2 (GT2), and the lcg-CE is mostly
>> hacked up GT2 Gatekeeper (think CE), to work with VOMS and other things.
>> GT3 and GT4 were the Web Service wilderness years, and the problem was that
>> few people wanted webservice stuff at the time, and those that did got
>> burned when GT4 decided to use a totally incompatible state model, and also
>> a totally incompatible wire format from GT3; and did it round around the
>> point in time when GT3 stuff was just about mature enough to seriously
>> deploy (i.e. just before the documentation was usably complete). Thus very
>> little of GT3 or GT4 ever reached a production Grid. GT3 did introduce
>> OGSA, the Open Grid Services Architecture, through which must time has been
>> wasted waffling about Service Orientated Architecture [0].
>>>
>>> With GT5 they went back to the GT2 stuff, and backported all the various
>> improvements over the years, and worked from there. It includes an updated
>> Gatekeeper.
>>>
>>> People like the Globus Gatekeeper for, as far as I can see, three key
>> reasons.
>>>
>>> 1. It is simple to deploy - one provider, one install, done.
>>> 2. It is simple to use - You gridFTP files from A to B, and you submit
>> your job to a cluster, and it runs there. That's it.
>>> 3. It has a lot of mind share - for better or worse Ian Foster invented
>> the term Grid [1], and as he runs the Globus project, there is a strong
>> connection that Globus == Grid [2]
>>>
>>> The biggest reason (as far as I can tell) that the Globus Gatekeeper
>> isn't so used for HEP over here is most strongly related to points 2, and
>> weakly to 3 (i.e. the old Not Invented Here syndrome). The key reasons,
>> however, is that the Gatekeeper is designed for larger clusters - the
>> typical cluster size in the USA is 5000 to 80 000 cores. With an
>> expectation of larger, and hence fewer, clusters, there are something that
>> are different. Firstly - metascheduling - not in the normal tools. You
>> pick a cluster, and send it there, manually. Although not relevant now
>> (because of the growth of pilot jobs, moving this concern into experiment
>> frameworks) Data dependant metascheduling was a big thrust of CE
>> development work in EDG/EGEE.
>>>
>>> So: IGE
>>>
>>> IGE is taking the GT5 components, and modifying them to fit within a
>> European Grid Context - i.e. VOMS, Argus, LCMAPS/LCAS on the security
>> side, and I think the intent is to get the gLite WMS to work with an IGE
>> Gatekeeper [3], and so on. It will also act as n-th level support (for
>> some value of n) for people using it in Europe.
>>>
>>> The main intent of this is to have a set of services to kill off the
>> older VDT/raw Globus stuff, that will co-exist with other Grid systems, for
>> those that want to use pretty much basic Globus.
>>>
>>>
>>> As has been mentioned, the NGS (specifically: Dave Wallom, as Technical
>> director) is a fan of Globus. With the retireal of the National Four
>> Clusters, we did see a noticeable increase in use of 'the' NGS WMS, we did
>> see an increase in use of the gLite stack, but the general recommendation
>> from NGS has (thus far) always been toward Globus.
>>>
>>> I do not know what the NGS's plans are, re IGE. It would be a natural
>> technology for them to adopt, on paper; but there may well be other factors
>> I'm not aware of.
>>>
>>>
>>> Frankly, my opinion (with a GridPP hat on)? IGE is irrelevant to us
>> unless and until WLCG / HEP experiments ask for it. It may have value for
>> NGS type users, but I'm of the opinion that they can ask us, rather than
>> the other way round (new technology should be driven from the User to the
>> operations team, and not the other way round).
>>>
>>>
>>> [0] And even more effort wasted. All the promise that people attribute
>> to SOA, other than a simple "A Service is a Library, with multi languagae
>> bindings", require most of the stuff normally called Semantic Web - and
>> _that_ currently can only with withing the contexts of well defined
>> Ontologies.
>>> [1] Even although Miron Levny had ben doing it for years by then, with
>> Condor.
>>> [2] Despite the fact that Grid is a stupid name, and Globus fits less of
>> the key aspects than many other things.
>>> [3] From memory, may not be accurate.
|