On Mon, Sep 11, 2006 at 09:39:54AM +0100 or thereabouts, Stephen Childs wrote:
> Gordon, JC (John) wrote:
> >Thanks Steve. This isn't really an operational matter for the
> >Monday(Wednesday) meeting. It has been an SA1 requirement for at least a
> >couple of years. At HEPiX last October, Francesco finally understood
> >what was required and why and it has been on JRA1 workplan since then.
> >Something was delivered but I never heard how the testing went.
>
> The slides are available here:
>
> http://hepix.caspur.it/spring2006/TALKS/4apr.prelz.dir/index.html
>
> They're slightly depressing reading I'm afraid as they seem to have just
> gone for the quick bodge option to shut the sysadmins up. For me there are
> probably two separate issues that should probably be fixed separately:
>
> 1) The standard job submission parameters (walltime, cputime, etc.)
> supported by Globus RSL and every LRMS under the sun _should_ be parsed
> out from the JDL ClassAds on the WMS and the appropriate variables set in
> the RSL. If we're using RSL as a layer of indirection we should take
> advantage of it fully.
>
> 2) There is a potentially unlimited number of other useful parameters that
> users could set in the requirements of the JDL. For example, you could
> imagine that software requirements (e.g. MPICH) could be used by a local
> batch system to load the appropriate software modules for the job and
> potentially tweak the submission script accordingly (e.g. by doing an
> mpiexec). For this there would need to be some agreement on naming of the
> requirements.
>
> To step back a bit, I find it depressing that no-one in JRA1 seems to be
> thinking about these real-world issues: it applies across the board from
> MPI support to passing batch systems requirements to short jobs etc. etc.
> Is there anything we can do to improve this?
One thing we can do is just encourage the competition to the WMS. Using
CondorG on its own you can set these RSL parameters. Atlas already
put memory requiements in their jobs.
The other related thing is that during EGEE1 there was some obligation for
SA1(deployment and ops) to take the software that JRA1 produced. With
EGEE 2 this is no longer the case and the new SA1(operations) and
SA3(integration) can take software from elsewhere. For instance
the other viable RB which fusion are allready using is being talked
about.
http://www.gridway.org/ which places GGF's DRMAA interface in front
of collection of GRAM resources.
There is meant to be spanish RB as well but this could be the same one
given that fusion is centered in Spain.
Steve
>
> Stephen
> --
> Dr. Stephen Childs,
> Research Fellow, EGEE Project, phone: +353-1-8961797
> Computer Architecture Group, email: Stephen.Childs @ cs.tcd.ie
> Trinity College Dublin, Ireland web: http://www.cs.tcd.ie/Stephen.Childs
--
Steve Traylen
work email: [log in to unmask]
personal email: [log in to unmask]
jabber: xmpp:[log in to unmask]
|