Print

Print


Hi Steve, all

Thanks for the update about your progress with HTCondor CEs.  Until others, such as the Tier-1 are in position to try it, I am not sure there is a pressing need for a full discussion.  I agree with Alessandra's message that if we want to make progress we need to present a clear UK position (with progress) at the WLCG/HSF meeting,  I have therefore created an agenda for Friday to talk about the BDii decommissioning / new Information system:
https://indico.cern.ch/event/798505/

If sites want to try and create their own version of the json format before the meeting, you can find examples in github here:
https://github.com/gridpp/info-sys/tree/master/sites

Alastair






On 12 Feb 2019, at 15:44, Stephen Jones <[log in to unmask]> wrote:

On 12/02/2019 14:57, Alastair Dewhurst wrote:
I had originally pencilled in talking about HTCondor CE this week.  I don’t know if other have made any progress here, but the Tier-1 have been having some issues with our ARC CEs and understanding this problem and patching them has meant no time to actually deploy a HTCondor CE.  Has anyone else had any success with HTCondor CE deployment?  Steve do you have anything to add about the accounting development?

Here's the status of HTCondor-CE at Liverpool. I don't mind talking about it either, if you think it might be useful; it's still work in progress, though.

Right; I've got it going in production, with 752 slots. There are some issues. The bdii provider given with it counts hyper-threads as slots. In fact, we under-load our servers, since they are short on memory. Hence bdii claims more slots than we really do have. That's needs a fix.

I have a half written document on the deployment at Liverpool of HTCondor-CE but it's not ready to release. Matt's seen it, but more work is needed.

In general, documentation for the HTCondor-CE Accounting is now practically complete (in particular, scaling). It's just all the rest of it that is flimsy.

https://twiki.cern.ch/twiki/bin/view/LCG/HtCondorCeAccounting

Another possibility is to have a go at pushing the dead whale along the beach that is the BDii decommissioning. David Crooks mentioned to me that there is an EGI CSIRT F2F meeting next week which he is attending.  I believe he would have an opportunity to speak to people about updating the security monitoring so it no longer needed the BDii.

I would surely like the BDII to be gone, because it's my main problem with HTCondor-CE at the moment. The CERN written BDII puppet code has the error I mention above; also it only gives GLUE2, which makes it useless for hooking into APEL (which uses glue1 to acquire the CE benchmark that all nodes are scaled to.) There is a workaround for that, in the accounting doc (above) and Adrian and I have tentatively explored fixing it good and proper by putting in the publishing benchmark as a config option. But no real  progress was made on that since the workaround is so "OK".

Sorry if this is a mish-mash. That's how things stand at the moment. But I'll say this; HTCondor-CE is top notch. It's been running 752 jobs for nearly a month and I haven't heard a peep out of it. Hardly any load, either, on the CE part. But, as I say, the installation, esp. the BDII, is a bit of a rough road at the time being, I think. It's doable, obviously, but you do need to do a bit of head-scratching...

Cheers,

Ste




--
Steve Jones                             [log in to unmask]
Grid System Administrator               office: 220
High Energy Physics Division            tel (int): 43396
Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
University of Liverpool                 http://www.liv.ac.uk/physics/hep/

########################################################################

To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1



To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1