Hi,
just to add a few words to this interesting Hyperthreading thread:
last week at the LCG operational workshop my working group (operational
fabric) conducted a small informal survey of who's using HT and who's not.
It turned out that of about 15 sites being represented in the WG, only 2 had
setup their farm to use HT, and of those 2 sites only 1 was actually using
HT in a grid farm (the other used HT for nodes that were only processing
local jobs).
There are several considerations that come into play with HT in a grid
enviroment, some of which are:
- some applications may benefit from HT, others may not. If you are running
a generic resource center, you may well have clashes in the who benefits
from it. Now, we have seen cases where some experiments explicitly asked HT
to be turned on because, they said, this would have benefitted their (the
experiment's) applications. But if a second user (experiment) comes up with
a conflicting request, then you have an administrative problem not easily
solved, also because
- it turned out that one of the major problems all centers had was, how to
get factual data on whether HT is really beneficial to a given application.
It was reported that experiments seem not very interested in benchmarking
their applications (they likely have other priorities), which generates a
lot of "it seems that", "somebody told me that", "rumour has it that", and
so on, with regard to HT benefits/disadvantages. Moreover, with the current
development cycles, plus the number of VOs that generic resource centers
support/will support, a fundamental question in this respect is whether this
benchmarking is meaningful at all: for any given application in any given
experiment, for example, one version might use a threading model that
benefits from HT, the next (or previous) one might change the threading
model (or the overall design, for that matters), and that might make HT
totally unsuitable in terms of performace. Other considerations like which
kernel version is running contribute to make benchmarking activities even
more dynamic.
- even for LHC experiments alone, one needs to take some care on where to
enable HT. For example, your nodes should be properly equipped with enough
memory to avoid resource contention, and properly configured with regard to
the batch system. For example: if an application from experiment X requires
1GB of RAM, and you turn HT on a 2-CPU WN, you'd better make sure that, on
the one hand, you accept more than 2 jobs on that node [to avoid having 2
jobs running on the same physical CPU - this is referred to in the article
mentioned by Ian as "HT-aware passive and active load-balancing"] and, on
the other hand, you need to have enough RAM to avoid swapping.
- the only instance where there was a consensus on turning HT on was with IO
servers.
As part of the activities of the operational fabric WG that were discussed
last week, we plan to put together these and other "hands-on" considerations
in a document (that ideally would be categorized as a "recipes", or "how-to"
document, or similar).
Please do keep on contributing your experiences, then, so that they may be
taken into due account.
Thanks,
Davide
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
> On Behalf Of Ian Stokes-Rees
> Sent: Thursday, November 11, 2004 18:12
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] 64-bit processors and software
>
> Hi,
>
> Ake wrote:
> >>1. Hyper-threading should be turned off everywhere.
> >
> > I don't agree on that. We have it turned on but don't advertise it so
> > that jobs usually beleives that it's not there leaving the "second" cpu
> > for system processes.
>
> And here is a DeveloperWorks report which shows the speed-up (or
> slow-down) for various operations with or without HT:
>
> http://www-128.ibm.com/developerworks/linux/library/l-htl/
>
> It comes out generally in favour of HT, but, as I said, I was surprised
> not to find anyone here at Supercomputing who will vouch for HT for
> scientific computing clusters.
>
> What advantage do you get by saying there is a second processor which
> you only use for maintenance operataions? Couldn't you just run a
> additional process on the one real processor?
>
> >>3. Intel Itaniums currently have PCI-X support, which gives them some
> >>improved device access, however
> >>
> >>4. The next generation of AMD 64-bit processors will support PCI-X
> >
> > Do you really mean PCI-X? Most all server boards have had PCI-X support
> > since at least a year ago. All our Opterons have it.
>
> Sorry, I think that should have been PCI Express, which, I think, is not
> yet in AMD 64-bit processors (yet:
> http://www.theregister.co.uk/2004/10/19/amd_pci-e_launch/ ) and which I
> understand is primarily for visualisation (faster graphics card
> connections than AGP).
>
> Cheers,
>
> Ian.
>
> --
> Ian Stokes-Rees [log in to unmask]
> Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
|