> On 7 Jul 2016, at 23:00, L Kreczko <[log in to unmask]> wrote:
>
> Dear all,
>
> Not sure if it fits topic-wise, but personally I would be interested in ways to reduce the effort needed per site.
> I understand that my thoughts might seem a bit naiive, but I strongly believe that efficiency is to be gained in some areas.
> Let me set the scene.
>
> Docker
> Very recently Andrew reported the proof-of-concept of running LHC jobs with docker on HTCondor, which was followed up by Nebraska T2 switching almost 50% of their resources to run this way.
> By providing the application environment (software + all dependencies + OS) in a container it has the potential to eliminate some overhead in setting up worker nodes and allows small experiments (or individuals) to ship their application environment with the job (cached at the site's docker registry proxy, of course).
> It also detaches OS on the worker node and OS in the container (both have to be Linux atm, native docker support on Windows & Mac in beta [1]).
> TL;DR: Simplification and more features at once with docker & HTCondor.
The problem for this is cvmfs still. It either requires cvmfs running on the host outside the container (which means Weird HEP Specific Stuff on shared clusters) or using Parrot which does not seem to be as stable as it appears at first glance. Ideally, we’d run cvmfs inside the containers, but that’s not ready yet (a proof of concept would require kernel patching.)
I do think this is the third way to go, and we should able to get the experiments to provide self-contained containers and VMs with almost identical contents, and run them via a container factory attached to HTCondor, or Vcycle talking to OpenStack, or Vac on bare metal. But just the last two are practical at the moment, and only with VMs because of the cvmfs issue in containers.
Cheers
Andrew
>
> Meta sites
> This topic has been around for a while mostly pursued by Chris (Brew). I can definitely see the benefit of that, even if it starts slowly.
> Small sites could hop onto well maintained infrastructure (e.g. Mesos self-healing services or ELK) to gain capabilities that are impossible to solve with 0.5 FTEs or less.
> In theory it is possible for a bigger site (T1?) to provide a CE, BDII, SE, phedex box, monitoring (!), Puppet master (?), etc,
> while the smaller site provides computing and/or storage resources (WNs, disk nodes, GridFTP servers, squids).
>
> Logging & Monitoring
> Big sites have big (custom) solutions for their logging & monitoring (& accounting) {{citation needed}}. The ELK (Elastic-search, Logstash and Kibana) stack is just one example.
> There is no way every site can afford to run everything and experience from site to site travels slowly.
> Other than using expertise where it is available, I want to ask this:
> Is there a benefit of being able to correlate events/logs from grid services across the UK?
>
>
> Apologies for the wall of text, but with shrinking (wo-)manpower in the UK these thoughts have been on my mind.
>
> Cheers,
> Luke
>
>
> [1]
> http://thenewstack.io/native-docker-comes-windows-mac/
>
>
> On 7 July 2016 at 17:40, David Colling <[log in to unmask]> wrote:
> Dear All,
>
> We are due a technical meeting tomorrow. However, I have to be in an
> examiners meeting for at least part of the morning. So Andy, can you
> chair this meeting? I will try to join as the examiners meeting allows.
>
> Does anybody have anything specific that they would like to discuss or
> present on?
>
>
> Best,
> david
>
>
>
> --
> *********************************************************
> Dr Lukasz Kreczko
> Research Associate
> Department of Physics
> Particle Physics Group
>
> University of Bristol
> HH Wills Physics Lab
> University of Bristol
> Tyndall Avenue
> Bristol
> BS8 1TL
>
> +44 (0)117 928 8724
> [log in to unmask]
>
> A top 5 UK university with leading employers (2015)
> A top 5 UK university for research (2014 REF)
> A world top 40 university (QS Ranking 2015)
> *********************************************************
Cheers
Andrew
--
Dr Andrew McNab
University of Manchester High Energy Physics,
LHCb@CERN (Distributed Computing Coordinator),
and GridPP (LHCb + Tier-2 Evolution)
www.hep.manchester.ac.uk/u/mcnab
Skype: andrew.mcnab.uk
|