The original regex does actually work correctly for HTCondor (I just checked), but I agree it can probably be improved.
Regards,
Andrew.
________________________________________
From: Stephen Jones [[log in to unmask]]
Sent: Monday, September 12, 2016 4:34 PM
To: Marcus Ebert; Lahiff, Andrew (STFC,RAL,PPD)
Cc: Testbed Support for GridPP member institutes
Subject: Re: reported pending jobs when running LHCb and ATLAS jobs on the same site
Hi Marcus, (Andrew),
I finally figured out why _you_ had to use $pieces[0], in the
glue-generator.pl patch. The fault was quite subtle. Our original regex
was poor.
> my @pieces = split(/\s+/, $line);
That pattern returns a space as the first delimiter (!) which is not
really right. I should have used this (see below)
> my @pieces = split(" " , $line);
And then $pieces[0] is the right way to pick out the number.
So thanks. It worked here, as it was, but it was not perfect (the
Arc/Condor document now shows the Patched patch!)
Anyway, I expect Andrew Lahiff is aware of this problem, so does it
give good results at Ral?
Ste
On 09/10/2016 10:45 AM, Marcus Ebert wrote:
> Hi,
>
> looking through the wiki example helped to figure out which parts need
> to be set.
> I implemented something similar now and needed to change what is in
> the example
> return $pieces[1];
> to
> $pieces[0];
> since I have in my output files first the numbers and then the VO name.
> Putting the file which is read in the wiki could help to understand
> more easily the structure of the code if someone else comes across
> this in the future.
>
> Cheers,
> Marcus
>
> On Thu, 8 Sep 2016, sjones wrote:
>
>> Yes. I put it on the wiki, along with the other patches. I don't know
>> if it's up to date.
>>
>> (Note to self: make Example Build current...)
>>
>> https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown
>>
>>
>> Ste
>>
>> On 2016-09-08 13:19, Andrew Lahiff wrote:
>>> We (RAL) support both LHCb and ATLAS, as well as ALICE and CMS, and
>>> don't have this problem. A long time ago we made a small modification
>>> to one of the information system provider scripts so that our ARC CEs
>>> generate per-VO job stats correctly. By default ARC CEs only provide
>>> total job stats.
>>>
>>> Regards,
>>> Andrew.
>>>
>>> ________________________________________
>>> From: Testbed Support for GridPP member institutes
>>> [[log in to unmask]] on behalf of Marcus Ebert
>>> [[log in to unmask]]
>>> Sent: Thursday, September 08, 2016 11:47 AM
>>> To: [log in to unmask]
>>> Subject: Re: reported pending jobs when running LHCb and ATLAS jobs on
>>> the same site
>>>
>>> On Thu, 8 Sep 2016, Love, Peter wrote:
>>>
>>> > This queue was configured in our test factories but no longer needed
>>> > there. I've removed it so it should drop to 200 pending. I'd suggest
>>> Thanks Peter!
>>>
>>> > LHCb only query their jobs and not let atlas jobs affect their
>>> > submission rate.
>>> >
>>> Well, as far as I understand, LHCb is not actively querying pending
>>> jobs
>>> but uses what the ARC CE reports globally , which reports right now
>>> the
>>> number of pending Grid jobs in our case.
>>> I guess if ATLAS has own mechanisms to get the number of pending
>>> jobs and
>>> doesn't care about what the ARC CE reports for pending jobs, then
>>> it could
>>> be changed to report only LHCb jobs?
>>> Or we could set up separate CEs for LHCb and ATLAS.
>>>
>>> But before putting in any new setup or config change, it would be
>>> good to
>>> get some information about how other sites handle and solved this
>>> before.
>>>
>>>
>>> Cheers,
>>> Marcus
>>>
>>>
>>> > Cheers,
>>> > Peter
>>> > > > > On 8 Sep 2016, at 11:29, Marcus Ebert
>>> <[log in to unmask]> wrote:
>>> > > > > Hi Peter,
>>> > > > > It is the new SL7 queue.
>>> > > http://apfmon.lancs.ac.uk/q/UKI-SCOTGRID-ECDF_SL7
>>> > > > > There are 400 jobs pending in the queue right now.
>>> > > > > Cheers,
>>> > > Marcus
>>> > > > > On Thu, 8 Sep 2016, Love, Peter wrote:
>>> > > > > > Marcus, which queue are you referring to? > > >
>>> http://apfmon.lancs.ac.uk/query/ecdf/
>>> > > > > > > We do in fact throttle based on pending jobs but I'll
>>> check the > > > details.
>>> > > > > > > Cheers,
>>> > > > Peter
>>> > > > > > > > On 8 Sep 2016, at 11:00, Marcus Ebert
>>> <[log in to unmask]> > > > > wrote:
>>> > > > > > > > > Hi,
>>> > > > > > > > > I have a question for sites running ATLAS and LHCb
>>> jobs. What we > > > > see at ECDF right now is that there are no
>>> new LHCb jobs submitted > > > > to our site because of the large
>>> number of pending ATLAS jobs. It > > > > seems LHCb only submits new
>>> pilots when the number of pending jobs > > > > is below a threshold
>>> (for ECDF it was 10 pending pilots, now it's > > > > 30). However,
>>> ATLAS submits jobs at a rate that the number of > > > > pending
>>> jobs is in the hundreds. And since the total number of > > > >
>>> pending jobs is that large, no new LHCb jobs get submitted.
>>> > > > > Right now, we have one ARC CE which submits jobs for both VOs.
>>> > > > > > > > > Could other sites please let me know how they
>>> support LHCb and > > > > ATLAS at the same time and how they report
>>> pending/running/total > > > > number of jobs through an ARC CE that
>>> they still get LHCB jobs?
>>> > > > > > > > > > > > > Cheers,
>>> > > > > Marcus
>>> > > > > > > > > --
>>> > > > > The University of Edinburgh is a charitable body,
>>> registered in
>>> > > > > Scotland, with registration number SC005336.
>>> > > > > > > > > > > --
>>> > > The University of Edinburgh is a charitable body, registered in
>>> > > Scotland, with registration number SC005336.
>>> > >
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>
>>
>>
>
--
Steve Jones [log in to unmask]
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
|