On 01/05/2019 12:31, Chris Brew - UKRI STFC wrote:
> Okay; I’ve got more than a thousand 4 core LSST jobs sitting in the queue (though they are not running, so don’t show up in that graph, I guess.)
>
> ska too uses multicore to get more memory. At the moment dirac core developers don't seem to pick on clues that requesting memory might be needed.
>
> Well I suppose it does work for LHCb as is. Could we argue to get this onto the IRIS Dirac development road map, I think it really is going to be more of an issue in future.
>
> requesting the memory can still keep the cores empty without being accounted for, this is a very old discussion that is ongoing also in ATLAS. Said that several jobs don't use the memory they request so it is possible the kernel can handle it.
>
> I wasn’t even worrying about accounting them correctly yet (although as you hint, “charging” by CPUtime does not punish this behaviour, so tacitly encourages it). I just want to schedule them, so poor old LSST can get some of my cycles.
it's not matter of "encouraging" it is matter of being able to point at
a number when they ask you why not all your cpus are used. Counting how
long a job keeps occupied cores rather than counting the CPU time is
also the reason we moved to wall time for accounting.
cheers
alessandra
>
> Chris.
>
> From: Testbed institutes <[log in to unmask]> on behalf of "[log in to unmask]" <[log in to unmask]>
> Reply-To: Testbed institutes <[log in to unmask]>
> Date: Wednesday, 1 May 2019 at 12:21
> To: Testbed institutes <[log in to unmask]>
> Subject: Re: 4 core LSST jobs
>
> If anything dirac isn't sending enough jobs for some reason. Manchester hasn't run anything (or almost) for over 24h and it has happened also the other week not to get any job for more than 2 days. And the system doesn't seem to run more than ~650 jobs concurrently at all sites.
>
> [ss]
> On 01/05/2019 12:11, Alessandra Forti wrote:
> Hi Chris,
>
> On 01/05/2019 10:59, Chris Brew - UKRI STFC wrote:
>
> Ian, has just reminded me that Daniela told me at GridPP that these jobs are only requesting 4 Cores to get extra memory, so we could also apply a job transform to them to reduce the requested CPUs back down to one, while leaving the memory high.
>
> Could we also ensure that correct handling of requesting different amounts of memory is made a high priority feature request for Dirac, either for the core developers or for the UK IRIS funded work. LZ have had issues with this and now LSST. The one size fits all memory request just about worked for LHCb but does not for the multiple communities supported on the GridPP Dirac.
> ska too uses multicore to get more memory. At the moment dirac core developers don't seem to pick on clues that requesting memory might be needed.
>
>
> If it's true that LSST need a single core but more memory, we would have run orders of magnitude more jobs, if they had come in correctlyrequesting the resources they need.
> requesting the memory can still keep the cores empty without being accounted for, this is a very old discussion that is ongoing also in ATLAS. Said that several jobs don't use the memory they request so it is possible the kernel can handle it.
>
> To answer your first question LSST isn't flooding the sites with 4 cores.
>
> cheers
> alessandra
>
>
> Yours,
> Chris.
>
> On 01/05/2019, 10:39, "Testbed Support for GridPP member institutes on behalf of Chris Brew - UKRI STFC" <[log in to unmask] on behalf of [log in to unmask]><mailto:[log in to unmask]@STFC.AC.UK> wrote:
>
> Hi All,
>
> How are other ArcCE/Condor sites handling the LSST 4 core jobs?
>
> At the moment they are being treated like single core jobs here because I’ve not configured LSST for multicore, and consequently getting very few jobs starts. However, I think if I enable LSST for multicore jobs the 4 core jobs will fill draining slots before the other VOs 8 core jobs and grab all the resources.
>
> I could put a hard cap on the number of running LSST jobs but does anyone have a better way of handling multicore jobs with differing CPU requests?
>
> Thanks,
> Chris.
>
> --
> Dr Chris Brew
> Scientific Computing Manager
> Particle Physics Department
> UKRI - STFC - Rutherford Appleton Laboratory
> Harwell Oxford,
> Didcot
> OX11 0QX
> +44 1235 446326
>
>
> ########################################################################
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
> ########################################################################
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
>
>
> --
>
> Respect is a rational process. \\//
>
> For Ur-Fascism, disagreement is treason. (U. Eco)
>
> ________________________________
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
> ########################################################################
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
--
Respect is a rational process. \\//
For Ur-Fascism, disagreement is treason. (U. Eco)
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
|