Hi Simon,
I'll give you what I know about it.
On 12/02/2019 15:58, George, Simon wrote:
> advice on whether HTCondorCE is considered stable and ready for
> production,
It's stable and in production here. I'm very happy with it, but it'll
break now I've said that.
>
> whether the documentation is good enough for a sysadmin (not a Grid
> expert) to install and maintain it,
I fear the documentation was not "good enough" for that, I'm afraid.
> and where that documentation is.
The bits I needed are written in my own hieroglyphics in an A4 notebook
lying on my desk. I'm trying to find the time to translate them into a
form that is more useful. I hope to combine the useful bits that came
from elsewhere. The heavy lifting for HTCondor-CE is done by a CERN
puppet module that actual does do useful things. I'm working on
collating all that.
> I thought I heard that the accounting was not working yet for example?
>
Accounting is now working. I documented that bit properly. See here:
https://twiki.cern.ch/twiki/bin/view/LCG/HtCondorCeAccounting
> We were previously thinking of ARC CE because it seemed the most
> popular but also upon investigation seems complicated to deploy.
>
We use both. HTCondor-CE will replace ARC (just saying it should say why.)
> It's not just the CE but also the accounting (apel),
>
Done, as I say.
> and sbdii,
>
BDII works (after a fashion) but it needs "a good going over".
> Argus - although the latter two may also be needed for storage?
>
ARGUS took a while; I will definitely include a section on that.
> If the CE is a single point of failure should we not run at least two?
>
It depends how you build them. My lazy philosophy is that I have a
tested build system, and a CE has no real state. If it breaks bad, I'll
build a new one.
So, just ne to start with. If you get plenty of free time, do two. Have
a standby, or even load balance them and all that fancy stuff. I don't
bother.
> Then setting up the vast number of pool accounts and other WN
> configuration.
>
AFAIK, there's no way around that, yet.
> CVMFS and squid we would need anyway for local users.
>
Yep. You'd need that. You'd need everything a normal ARC/HTCondor set up
needs (except ARC!)
This is 5 years old, now, and it's so stale I can smell it. It is what
it is.
https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster
>
> I think it would be helpful if there would be recommended (either
> GridPP or wider) options for all the services required of a Tier2 site.
>
>
We have no specific budget for supporting these various components. In a
perfect world, we'd have an R&D group whose purpose in life is to
baseline all this stuff and come up with optimal configurations for the
whole lot of it. But that costs big money and it takes _years_ to set
up, even if you have a QA project manager in charge. So we do the best
we can. In my view, there are puppet modules for installing HTCondor in
a grid setting (AF gave the links in a talk at Pitlochry GridPP). And I
hope the work I'm doing results in a usable document that works in
conjunction with the CERN HTCondor-CE puppet modules etc. But there is
no silver bullet, since we moved from Yaim. This move is going to be a
long haul, I think.Yet it cannot be avoided. CREAM will disappear, but
we have a couple of years. That's the only good thing about it.
Cheers,
Ste
--
Steve Jones [log in to unmask]
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
|