On Sat, 2006-08-12 at 11:52 +0100, Burke, S (Stephen) wrote:
> > In any case, I feel a bit like my original question was unanswered:
> > isn't it *necessary* for a CE to be able to support 4000 simultaneous
> > jobs (whether running or queued)? How do the large computing centres
> > such as CERN, Bologna, FZK, and RAL handle this?
There are several problems at present:
1) GT2, as used for job submission at the moment, normally creates one
perl process per submitted-job per minute which polls the state of the
job in the batch system. This doesn't scale.
2) The Condor-derived WMS disables this normal functionality, and uses
the Fork interface to start up it's own job-monitoring agent.
This reduces the the number of proceses; now you only get one
continuously-running perl process per-user per-RB.
However, exposing the Fork interface to external users, as required by
this approach, is a serious security risk. This is the current default.
> In the medium term we're moving away from the current CE architecture
> anyway, see e.g.
>
> http://grid.pd.infn.it/cream/field.php
Oh, good grief! EGEE are going to invent yet-another job-submission
system?
Why not simply reuse Condor (Condor-C) or GridSAM (JSDL) or something
similar and be done with it?
Cheers,
David
--
David McBride <[log in to unmask]>
Department of Computing, Imperial College, London
|