Hi Steve,
Steve Traylen wrote:
> And from your point of view as well, you have an allocation based on
> wall time here. You will be using that allocation while waiting. At
> the moment this is a non issue since LHCb is still the only one submitting
> enough fast enough. You are currently using 30 times your allocation
> anyway at RAL. If another group can submit fast enough then you will
> be squashed down.
Wow. How long does "it" (and what is "it" anyway?) track relative VO
usage? It seems to me it should be a sliding window of a few months --
otherwise LHCb will be penalised for i) being the main source of jobs
and job-based debugging info for LCG; and ii) utilising otherwise
completely idle resources. I hope that once this system properly goes
into effect LHCb's previous LCG usage will not be held against us.
Of course, you are right that it is not desireable to hold resources and
not use them, thus wasting our "VO CPU Quota", but for the reasons I've
mentioned, we need the transfers to work, so we need to "hold out till
the last second" until the transfer works, or we run out of queue time.
> Writing a sweeper to replicate files not at CERN to CERN would seem
> as sensible thing to do?
We are actually already replicating all data to two places (AFAIK) --
one T1 and one T0 (ok, CERN). I think our system is just not mature
enough yet to do "transfer to AT LEAST one of the following two sites",
but I'm sure if this problem persists then "we" (Andrei) will come up
with such a system. Right now this is done semi-manually when the
data/job-output is verified.
Cheers,
Ian.
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
|