On Thu, 20 Sep 2007, Brew, CAJ (Chris) wrote:
> There are no doubt better solutions but that stuck me as by far the
> simplest - otherwise we either have to put some grid software on the
> LRMS host (I was hoping the keep that of there) or make the CEs know
> about each other and declare one the "primary" CE that publishes the
> cluster info. But that has the problem that if that CE goes down then
> the RB/WMS probably won't properly deal with the remaining CEs.
When multiple CEs share the same WNs, their common information could be
calculated once directly by the site BDII. In fact, it could determine
all of the information to be published for the CEs. It will need the
batch system client software for that, but that should not be a problem.
This means that none of the CEs need to be declared primary, nor would
extra rpms have to be installed on the batch server.
> The former is possibly the best solution but I can see it being a
> problem for sites that are basically a grid wrapper for a shared
> resource. They probably won't be too amenable to just installing a few
> rpms on the batch server. That would mean yet another yaim node type
> (LRMS-cluster-publisher? by default the same as the site BDII?) but it
Probably just a configuration option for the site BDII.
> still needs to get the VO tag info from the CE (needs the
> globus-gatekeeper service to manage them).
The GridFTP server.
> How common are multiple CEs anyway, maybe this should just be a wiki
> page of instructions rather than an official yaim option? It may be
> simpler in the specific that the general.
If we still are to stay with the LCG-CE for a while, and the number of
parallel jobs keeps increasing, many sites may have to deploy multiple
CEs to keep the load on each one manageable...
|