Print

Print


Hi Andrew,



On 22 July 2016 at 16:06, Andrew Lahiff <[log in to unmask]> wrote:

> Hi Luke,
>
> Sorry, by "worker-node only" I thought you literally meant worker-node
> only (where these connect to a regional central manager) rather than a
> complete batch system.
>
Ah, I have not considered this possibility. How would this work for "local"
users? Would they be able to submit to the regional central manager and
their jobs routed to the local system?

>
> Yes, I'm referring to the new version of swarm in the upcoming 1.12. I was
> thinking that after installing Docker on lots of machines you then
> essentially just need to run two commands to create your squids and worker
> nodes, where the worker node is a startd in a container configured to
> connect to e.g. a Southgrid condor instance (you would need to supply a
> credential of

If the node is a startd inside a container, do multicore jobs still work?

> course in some way so the startd is allowed to connect to the central
> managers & schedd). If you wanted a local central manager this would be a
> third "docker service create...".
>
I would be fine with a remote central manager, but the details of shares
for local users would probably add complexity.

>
> Regarding pool accounts, I still think that with condor per-slot users and
> cpu cgroups they don't have much (if any) benefit, unless you're using unix
> authentication for your storage.
>
What about glexec then? How would it work in that scenario?

>
> I think off-site Argus, CEs, and BDIIs would be relatively easy to do,
> although of course it would be good to avoid having to use a common set of
> pool accounts for the CEs :-)
>
Indeed. Well, let me know if you want to run any of the above for
UKI-SOUTHGRID-BRIS-HEP ;).

Cheers,
Luke

>
> Regards,
> Andrew.
>
> ________________________________
> From: Testbed Support for GridPP member institutes [
> [log in to unmask]] on behalf of L Kreczko [[log in to unmask]
> ]
> Sent: Friday, July 22, 2016 3:01 PM
> To: [log in to unmask]
> Subject: Re: meeting tomorrow?
>
> Hi Andrew,
>
> Are you referring to docker swarm? In that scenario, how would you
> orchestrate services versus worker nodes?
> For services you know what you need, while the worker node side would run
> some sort of batch system, would it not?
>
> And even in that scenario you would still need the maintenance, upgrade &
> debug knowledge on the site.
> Ideally I would like to see a worker node only site with some sort of
> batch system (and maybe storage) and local cache (squid) with the grid
> interface (CE, SE, BDII) off-site.
>
> This way the site has
>  - full control over computing & storage shares (quotas for remote users)
>  - no need for grid knowledge (certs, accounting, etc)
>
> Of course, for such a system to work, the off-site services need to be
> reachable from the site and in some cases the other way around.
> Plus a whole bunch of things not listed here (pool accounts, ...).
>
> So my questions are: Is this desirable? How do we get there?
>
> BDII and ARGUS should be easy to put off-site, the CEs should be doable
> after agreeing on pool accounts & shares.
> It should also be doable for move a DMLite SE off-site as long as it can
> talk to the file system and GridFTP servers.
>
> Cheers,
> Luke
>
> On 20 July 2016 at 18:37, Andrew Lahiff <[log in to unmask]<mailto:
> [log in to unmask]>> wrote:
> Hi Luke,
>
> I would say that you shouldn't even need to deploy some machines as worker
> nodes, some as squids and perhaps some as other services. You just need
> lots of identical machines with Docker engine only installed - Docker
> itself can now form a cluster without needing any external services
> (Consul, etcd or ZooKeeper are no longer required) so it should be possible
> for small sites or sites without much available effort to be able to take
> advantage of container orchestration, e.g. self-healing squids/other stuff,
> automated rolling upgrades, ...
>
> Regards,
> Andrew.
>
> ________________________________
> From: Testbed Support for GridPP member institutes [
> [log in to unmask]<mailto:[log in to unmask]>] on behalf of
> L Kreczko [[log in to unmask]<mailto:[log in to unmask]>]
> Sent: Tuesday, July 19, 2016 5:35 PM
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: meeting tomorrow?
>
>
>
> On 18 July 2016 at 14:14, Andrew McNab <[log in to unmask]<mailto:
> [log in to unmask]><mailto:[log in to unmask]<mailto:
> [log in to unmask]>>> wrote:
>
> > On 7 Jul 2016, at 23:00, L Kreczko <[log in to unmask]<mailto:
> [log in to unmask]><mailto:[log in to unmask]<mailto:
> [log in to unmask]>>> wrote:
> >
>
>
> The problem for this is cvmfs still. It either requires cvmfs running on
> the host outside the container (which means Weird HEP Specific Stuff on
> shared clusters) or using Parrot which does not seem to be as stable as it
> appears at first glance. Ideally, we’d run cvmfs inside the containers, but
> that’s not ready yet (a proof of concept would require kernel patching.)
> Very true, I am looking forward to that moment. Once CVMFS is ready all
> you would need from the local site would be the WNs (running HTCondor +
> docker), squids and the docker registry (proxy), would you not?
>
> I do think this is the third way to go, and we should able to get the
> experiments to provide self-contained containers and VMs with almost
> identical contents, and run them via a container factory attached to
> HTCondor, or Vcycle talking to OpenStack, or Vac on bare metal. But just
> the last two are practical at the moment, and only with VMs because of the
> cvmfs issue in containers.
> Ideally as close to something used in industry as possible. This has not
> only the advantage of a big community & developer base but knowledge about
> docker & OpenStack is also useful outside grid computing ;).
>
> Cheers,
> Luke
>
> Cheers
>
>  Andrew
>
> >
> > Meta sites
> > This topic has been around for a while mostly pursued by Chris (Brew). I
> can definitely see the benefit of that, even if it starts slowly.
> > Small sites could hop onto well maintained infrastructure (e.g. Mesos
> self-healing services or ELK) to gain capabilities that are impossible to
> solve with 0.5 FTEs or less.
> > In theory it is possible for a bigger site (T1?) to provide a CE, BDII,
> SE, phedex box, monitoring (!), Puppet master (?), etc,
> > while the smaller site provides computing and/or storage resources (WNs,
> disk nodes, GridFTP servers, squids).
> >
> > Logging & Monitoring
> > Big sites have big (custom) solutions for their logging & monitoring (&
> accounting) {{citation needed}}. The ELK (Elastic-search, Logstash and
> Kibana) stack is just one example.
> > There is no way every site can afford to run everything and experience
> from site to site travels slowly.
> > Other than using expertise where it is available, I want to ask this:
> > Is there a benefit of being able to correlate events/logs from grid
> services across the UK?
> >
> >
> > Apologies for the wall of text, but with shrinking (wo-)manpower in the
> UK these thoughts have been on my mind.
> >
> > Cheers,
> > Luke
> >
> >
> > [1]
> > http://thenewstack.io/native-docker-comes-windows-mac/
> >
> >
> > On 7 July 2016 at 17:40, David Colling <[log in to unmask]<mailto:
> [log in to unmask]><mailto:[log in to unmask]<mailto:
> [log in to unmask]>>> wrote:
> > Dear All,
> >
> > We are due a technical meeting tomorrow. However, I have to be in an
> > examiners meeting for at least part of the morning. So Andy, can you
> > chair this meeting? I will try to join as the examiners meeting allows.
> >
> > Does anybody have anything specific that they would like to discuss or
> > present on?
> >
> >
> > Best,
> > david
> >
> >
> >
> > --
> > *********************************************************
> >   Dr Lukasz Kreczko
> >   Research Associate
> >   Department of Physics
> >   Particle Physics Group
> >
> >   University of Bristol
> >   HH Wills Physics Lab
> >   University of Bristol
> >   Tyndall Avenue
> >   Bristol
> >   BS8 1TL
> >
> >   +44 (0)117 928 8724
> <tel:%2B44%20%280%29117%20928%208724><tel:%2B44%20%280%29117%20928%208724>
> >   [log in to unmask]<mailto:[log in to unmask]><mailto:
> [log in to unmask]<mailto:[log in to unmask]>>
> >
> >   A top 5 UK university with leading employers (2015)
> >   A top 5 UK university for research (2014 REF)
> >   A world top 40 university (QS Ranking 2015)
> > *********************************************************
>
> Cheers
>
>  Andrew
>
> --
> Dr Andrew McNab
> University of Manchester High Energy Physics,
> LHCb@CERN (Distributed Computing Coordinator),
> and GridPP (LHCb + Tier-2 Evolution)
> www.hep.manchester.ac.uk/u/mcnab<http://www.hep.manchester.ac.uk/u/mcnab><
> http://www.hep.manchester.ac.uk/u/mcnab>
> Skype: andrew.mcnab.uk<http://andrew.mcnab.uk><http://andrew.mcnab.uk>
>
>
>
>
> --
> *********************************************************
>   Dr Lukasz Kreczko
>   Research Associate
>   Department of Physics
>   Particle Physics Group
>
>   University of Bristol
>   HH Wills Physics Lab
>   University of Bristol
>   Tyndall Avenue
>   Bristol
>   BS8 1TL
>
>   +44 (0)117 928 8724<tel:%2B44%20%280%29117%20928%208724>
>   [log in to unmask]<mailto:[log in to unmask]><mailto:
> [log in to unmask]<mailto:[log in to unmask]>>
>
>   A top 5 UK university with leading employers (2015)
>   A top 5 UK university for research (2014 REF)
>   A world top 40 university (QS Ranking 2015)
> *********************************************************
>
>
>
> --
> *********************************************************
>   Dr Lukasz Kreczko
>   Research Associate
>   Department of Physics
>   Particle Physics Group
>
>   University of Bristol
>   HH Wills Physics Lab
>   University of Bristol
>   Tyndall Avenue
>   Bristol
>   BS8 1TL
>
>   +44 (0)117 928 8724
>   [log in to unmask]<mailto:[log in to unmask]>
>
>   A top 5 UK university with leading employers (2015)
>   A top 5 UK university for research (2014 REF)
>   A world top 40 university (QS Ranking 2015)
> *********************************************************
>



-- 
*********************************************************

*Dr Lukasz Kreczko              Research Associate*
*  Department of Physics*

*  Particle Physics Group*
  University of Bristol
  HH Wills Physics Lab
  University of Bristol
  Tyndall Avenue
  Bristol
  BS8 1TL

  +44 (0)117 928 8724
  [log in to unmask]

  A top 5 UK university with leading employers (2015)
  A top 5 UK university for research (2014 REF)
  A world top 40 university (QS Ranking 2015)
*********************************************************