Steve Traylen tells me that Glue subclusters is one of the topics his WN
WG is looking at.
John
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Bly, MJ (Martin)
> Sent: 01 November 2007 15:11
> To: [log in to unmask]
> Subject: Re: jobs using up too much memory
>
> SB:
> > Indeed. The current situation is that the advertised value should be
> RAM
> > per WN, and if you want the value per CPU you should divide by the
> > SMPSize attribute, which everyone is hopefully setting to
> the number
> > of CPUs per node. Of course what you really want to know is the
> RAM per
>
> That which defines a 'CPU' varies - some say it refers to a
> chip and others to a core. 'CPU' count varies between nodes:
> for example all our boxes have two chips but some have dual
> cores and I expect we may have quad cores to deal with. (And
> don't get me started on hyperthreading chips...). Is SMPSize
> supposed to be defined as n-chips or n-cores?
> How does oteh schema deal with single, dual and quad chip
> boxes in the same batch system?
>
> Martin.
> --
> Martin Bly
> RAL Tier1 Fabric Team
>
> > -----Original Message-----
> > From: Testbed Support for GridPP member institutes
> > [mailto:[log in to unmask]] On Behalf Of Burke, S (Stephen)
> > Sent: 01 November 2007 14:58
> > To: [log in to unmask]
> > Subject: Re: jobs using up too much memory
> >
> > Testbed Support for GridPP member institutes
> > > [mailto:[log in to unmask]] On Behalf Of Duncan Rand said:
> > > Isn't there still some confusion surrounding the term
> > > GlueHostMainMemoryRAMSize - is it the RAM per node or the RAM per
> > > core?
> >
> > Indeed. The current situation is that the advertised value
> should be
> > RAM per WN, and if you want the value per CPU you should
> divide by the
> > SMPSize attribute, which everyone is hopefully setting to
> the number
> > of CPUs per node. Of course what you really want to know is the RAM
> > per job, but that's a little harder to define. See GGUS
> ticket #26298
> > for more discussion, although that's curently closed as unsolved
> > because the discussion has moved into Steve Traylen's new
> WN working
> > group
> > (http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/wn.htm).
> >
> > > The real problem is that the job's requirements are not passed to
> > > the scheduler. If they were it would be able to operate
> as intended
> > > and manage node memory properly.
> >
> > That's also part of what Steve T's group is looking at.
> >
> > Stephen
> >
>
|