Hi,
my 2 cents ;-)
> But isn't that exactly what e.g. the CERN CEs are doing?
> For example, ce101 and ce102 are fronting the same WNs.
Plus some more identical CEs because a single CE is not able to cope
with the work load. If this service could be distributed over a load
balanced cluster of (stateless) machines this problem would not exist,
right ?
If only one CE is supposed to publish these values per sub cluster,
which one should be picked ? What happens if this CE goes down (*) ?
Also, the number of cores aka CPUs in this environment is not at all
static but very dynamic. It changes every day because machines come and
go for various reasons. Is the number of cores/cpus a useful number at
all, if the resources are in fact shared (as they are at CERN) with
local users ? (*)
I'm surprised to hear statements like "misuse of the schema" while this
is the only possibility for us to actually survive the load. Please
believe me that it is not easy to maintain a cluster of 24 machines in a
CE cluster, and if we have to start to make individual machines special
in some way, the system will become unmaintainable very soon. As a
matter of fact we have a job throughput right now of 130k jobs per day
and about 50k jobs in the system, increasing.
Practical suggestions for a solution are very welcome.
Cheers,
Ulrich
(*) one could indeed set up a non existing "fake" CE per cluster in the
BDII which would publish these numbers. I do not really like this idea
because it is just another ugly hack to get around a limitations of the
system, isn't it ?
|