Hi,
Thinking aloud here,
Would a containerized CE+Argus+BDII submitting docker universe jobs with a standard container as ‘condoruser’ be an (almost) zero conf gridification of an existing local batch service?
I think the ArcCE can be set up to use Argus for banning but then just map everyone to a single account (actually could it use the central Argus for that), docker universe would provide the isolation, standard container would have the middleware + Grid config.
Another option would be hosted CEs a la OSG, they would run an HTCondorCE centrally that only needed ssh access to a submit host and supported multiple batch systems to boot.
Yours,
Chris.
On Tue, Feb 12, 2019 at 4:31 PM +0000, "Sam Skipsey" <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi,
Comments inline :)
On Tue, 12 Feb 2019 at 15:58, George, Simon wrote:
>
> We need the HTCondor batch system anyway, so I suppose the pain is the CE.
>
> We have yet to install it though which is why I am looking at options. We have to migrate away from CREAM and Torque+Maui as well as from SL6 to CC7.
>
>
> I don't follow developments closely enough so I would appreciate advice on whether HTCondorCE is considered stable and ready for production, whether the documentation is good enough for a sysadmin (not a Grid expert) to install and maintain it, and where that documentation is.
>
> I thought I heard that the accounting was not working yet for example?
I think Ste's post which you saw addresses this mostly [it's also
something we discussed a bit in the GridPP Ops meeting today]
>
> We were previously thinking of ARC CE because it seemed the most popular but also upon investigation seems complicated to deploy.
>
>
> It's not just the CE but also the accounting (apel), and sbdii, Argus - although the latter two may also be needed for storage?
DPMs, at least, don't need an ARGUS [although they can talk to one for
banning] - there's no user mappings in a DPM, we just directly id with
the DN and VOMS roles of the certificate presented. dCache experts can
correct me if I am wrong, but I think it, and Storm, do need to do
some kind of user mapping?
>
> If the CE is a single point of failure should we not run at least two?
The CE is a single point of failure for jobs arriving at your site
from outside + being "registered as complete" to the outside. A CE
failing doesn't prevent jobs which are already running at the site
from continuing on [and, in some cases, CEs can pick up the result of
jobs which completed whilst they were down, although that depends on
how the job wrappers work]. So, yes, they're a SPoF, but they can also
"go down" for an hour without causing utter destruction.
>
> Then setting up the vast number of pool accounts and other WN configuration.
>
Definitely, pool accounts are awful: and I wrote a presentation
talking about how awful both pool accounts, and Grid proxies in
general as used by us, are many years back now ;)
Unfortunately, I don't think there's an out of the box solution for
this [it's obvious *how* you make something which solves this: running
all jobs in disposable containers, with one remapped user-account per
container for isolation - but I don't think there's a fully baked
solution which does the config itself for you], which means that for
smaller sites / people who want to do this with less admin overhead,
you're still in the same situation.
This is a bit waffley, but I agree that there's probably not an
optimal solution yet for your use case.
[The other alternative is to go the long way around, and submit to VAC
via the GridPP DIRAC; but that means your local users all need grid
certificates, and you need some kind of "local VO" to map them within]
Sam
> CVMFS and squid we would need anyway for local users.
>
>
> I think it would be helpful if there would be recommended (either GridPP or wider) options for all the services required of a Tier2 site.
>
>
> Thanks for your thoughts.
>
> Simon
>
>
> ________________________________
> From: Testbed Support for GridPP member institutes on behalf of Sam Skipsey <[log in to unmask]>
> Sent: 12 February 2019 15:19
> To: [log in to unmask]
> Subject: Re: VAC with VM condor VM to support local batch jobs?
>
> So, isn't that simple-lightweight system supposed to be an HTCondorCE,
> if you believe its cheerleaders [assuming you use HTCondor as your
> batch system]?
>
> (Similarly, in general, "running things in predesigned virtual
> machines" is a thing which HTCondor can do - and has been able to do
> for a decade now - if what you like is the "uniform, single image"
> aspect of VAC. Obviously, it's not as "hands-off" as VAC is - but,
> again, that's part of the administrative/control tradeoff which VAC
> sort of relies on.)
>
> If you're using something which is not a HTCondor batch system, then
> obviously HTCondorCEs are less useful! The question is: what's the
> pain point for you: the CE, or the batch system?
>
> Sam
>
> P.S. I don't think "GridPP" as a monolithic body has a uniform recommendation.
>
> On Tue, 12 Feb 2019 at 14:19, George, Simon wrote:
> >
> > Thanks, I see your point Sam.
> >
> > What seems to be missing is a simple, lightweight site option for all the grid services needed to interface to a local batch system.
> >
> > VAIB is a great solution except that we still want a local batch system. I guess most sites need that?
> >
> > I'm not clear what GridPP is recommending, because I thought it was VAIB.
> >
> >
> >
> > ________________________________
> > From: Testbed Support for GridPP member institutes on behalf of Sam Skipsey <[log in to unmask]>
> > Sent: 12 February 2019 09:54
> > To: [log in to unmask]
> > Subject: Re: VAC with VM condor VM to support local batch jobs?
> >
> > So, my belief was that the entire point of VAC (originally) was to
> > avoid needing a local batch system if you just want to support Grid
> > jobs [ that is, that the VAC vms would precisely not support local job
> > scheduling, because you can simplify a local site if all the
> > "scheduling" happens at the remote end - in, for example, a DIRAC
> > instance, or other VO-owned job-submission-to-pilots framework]. In
> > general, I've always argued that if you have a need to support local
> > users and run jobs locally, this justifies having an actual batch
> > system.
> >
> > Sam
> >
> > On Mon, 4 Feb 2019 at 10:40, George, Simon wrote:
> > >
> > > And if not, how else do VAC sites make their resources available to local users?
> > >
> > > ________________________________
> > > From: Testbed Support for GridPP member institutes on behalf of George, Simon
> > > Sent: 04 February 2019 10:38
> > > To: [log in to unmask]
> > > Subject: Re: VAC with VM condor VM to support local batch jobs?
> > >
> > > I had no answers to this, so can I just check that this means literally no-one has tried or thought about this?
> > > Thanks,
> > > Simon
> > >
> > > ________________________________
> > > From: George, Simon
> > > Sent: 28 January 2019 11:07
> > > To: [log in to unmask]
> > > Subject: VAC with VM condor VM to support local batch jobs?
> > >
> > >
> > > I'd like to know if anyone who runs VAC has tried using the VMcondor VM to get local batch jobs onto their VAC nodes?
> > >
> > > Or if not are you thinking about it?
> > >
> > > Thanks,
> > >
> > > Simon
> > >
> > >
> > > ________________________________
> > >
> > > To unsubscribe from the TB-SUPPORT list, click the following link:
> > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> > >
> > >
> > > ________________________________
> > >
> > > To unsubscribe from the TB-SUPPORT list, click the following link:
> > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >
> > ########################################################################
> >
> > To unsubscribe from the TB-SUPPORT list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >
> > ________________________________
> >
> > To unsubscribe from the TB-SUPPORT list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
> ########################################################################
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
> ________________________________
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
|