JISCMail - LCG-ROLLOUT Archives

Steve Traylen a écrit :

>On Fri, Jan 28, 2005 at 05:06:48PM -0000 or thereabouts, pierre girard wrote:
>
>
>>Hi Min,
>>
>>Unfortunately, this CPU count strategy does not work with CCIN2P3-LCG2
>>site.
>>
>>
>
>I don't think it really matters to the system that the CPU count is wrong.
>It might matter for to publicity of course but so long as the
>ETT time goes up when you start queuing jobs then things will
>be reasnoable.
>
>
I completely agree with that.
So, unfortunately for Min, there is no current solution to easily
estimate the real CPU count of all the sites.

Pierre

> Steve
>
>
>>Indeed, the real number of physical CPUs supplied to the grid at CC is
>>100 CPUs at this moment, and more than 700 CPUs very soon (hopefully
>>next week).
>>
>>With our batch system, we define in general 2 queues by physical CPU,
>>but each "queue" (called a WorkPoint) can be configured to accept
>>several classes of jobs (something like short (A), medium (G) and long
>>(T)).
>>
>>At this moment, the Glue schema does not allow us to express the subtely
>>of our batch system queue mechanism. As a consequence, it is impossible
>>for you to infer the total of real CPUs from the published data. What we
>>currently publishes are virtual CPUs instead of real CPUs.
>>
>>The solution consisting in adding CPU counts at Subcluster level,
>>proposed by stephen, should be the solution to our problem.
>>
>>Anyway, in our specific and current case, the solution is to take the
>>max of the queues.
>>
>>But a solution could be to sum systematically the CPUs of each queue by
>>site. Indeed, this value has the same meaning for all the sites.
>>According to me, it reflects the number of jobs a site pretends to be
>>able to perform simultaneously, what I call the virtual CPUs. It is the
>>role of the site administrator to set correctly this value in
>>concordance with the real capacities of his/her site.
>>
>>When we will have the possibility to get the real CPU count, it will be
>>very interesting to compute the ratio between virtual CPUs and real ones.
>>
>>Hope this helps ;).
>>
>>Pierre
>>
>>
>>
>>
>>
>>
>>Min Tsai a écrit :
>>
>>
>>
>>>Hi All,
>>>
>>>The fix is in for the CPU count.  Three other sites had their CPU stats
>>>change: CCIN2P3-LCG2, INFN-LNL-LCG, INFN-PADOVA.  Let me know if these
>>>number are inaccurate for some reason.
>>>
>>>Best Regards,
>>>Min
>>>
>>>-----Original Message-----
>>>From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
>>>Behalf Of Min Tsai
>>>Sent: Wednesday, January 26, 2005 12:41 PM
>>>To: [log in to unmask]
>>>Subject: Re: [LCG-ROLLOUT] TotalCPU count on the GOC Mon
>>>
>>>Dear Anar,
>>>
>>>Typically CPU stats for queues on a single CE repeat all refer to the same
>>>set of CPUs.  So to prevent recount of CPU Gstat adds up the CPU stats for
>>>the first queue it encounters for each unique CE.  So in your case:
>>>
>>>It adds up:
>>>GlueCEUniqueID=lcg03.gsi.de:2119\/jobmanager-torque-alice  2 CPU
>>>GlueCEUniqueID=lcg06.gsi.de:2119\/jobmanager-lcglsf-alice  16 CPU
>>>
>>>I have not noticed a configuration like yours before, so I will make a
>>>modification by adding CPU from queues that have different total CPU
>>>statistics even though they reside on the same CE.  The only problem we will
>>>have if when 2 queues on a single CE has the same total CPU count even
>>>though they are referring to 2 completely different clusters.
>>>
>>>I hope this will correct the CPU problem for you site.  Thank you for
>>>providing this feedback!  I will let you know once this I have tested and
>>>complete this change.
>>>
>>>Cheers,
>>>Min
>>>
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
>>>Behalf Of Anar Manafov
>>>Sent: Wednesday, January 26, 2005 11:59 AM
>>>To: [log in to unmask]
>>>Subject: [LCG-ROLLOUT] TotalCPU count on the GOC Mon
>>>
>>>Good day to ALL!
>>>
>>>I have mentioned that on the monitoring (http://goc.grid.sinica.edu.tw/
>>>gstat/lcg03.gsi.de/) we (GSI) publishing only 18 CPU (Total CPU). So, I
>>>wonder how this number is calculated and why not all of the queues are
>>>affected.
>>>We have 2 different CE:
>>>Torque CE (with 2 CPU).
>>>LSF CE (more than 300 CPU),
>>>in LSF we have &#8220;dteam&#8221; and &#8220;alice&#8221; queues.
>>>For &#8220;alice&#8221; ~ 16 PCU
>>>For &#8220;dteam&#8221;  ~ 344 CPU or something (Later on, when we finish
>>>the test of
>>>our new pool-accounts algorithm we will publish more CPU on the
>>>&#8220;alice&#8221;).
>>>
>>>So, my question would be which algorithm monitoring uses to calculate Total
>>>CPU amount?
>>>
>>>I would appreciate any comment on this.
>>>
>>>Thank you very much in advance.
>>>
>>>Best of luck,
>>>
>>>Anar
>>>
>>>
>>>
>>>
>>>
>>--
>>______________________
>>Pierre GIRARD
>>Grid Computing Team Member
>>IN2P3/CNRS Computing Centre - Lyon (FRANCE)
>>http://cc.in2p3.fr
>>Tel. +33 4.78.93.08.80 | Fax. +33 4.72.69.41.70 | e-mail: [log in to unmask]
>>
>>
>
>--
>Steve Traylen
>[log in to unmask]
>http://www.gridpp.ac.uk/
>
>
>

--
______________________
Pierre GIRARD
Grid Computing Team Member
IN2P3/CNRS Computing Centre - Lyon (FRANCE)
http://cc.in2p3.fr
Tel. +33 4.78.93.08.80 | Fax. +33 4.72.69.41.70 | e-mail: [log in to unmask]