On Sunday 23 February 2003 09:13, Jeffrey B. Layton wrote:
> Tim Prince wrote:
> >On Sunday 23 February 2003 04:41, Jeffrey B. Layton wrote:
> >>Tim Prince wrote:
> >>>In my experience,
> >>>'mpirun -np 2' on a single CPU P4 increases throughput by about 10% from
> >>>-np 1, but that gain doesn't hold up for scaling to a large cluster with
> >>>simple interconnects.
> >>
> >>Tim,
> >>
> >> My experience agrees with yours. When I ran the NASA Parallel
> >>Benchmarks on a Xeon cluster with even plain Fast Ethernet, it
> >>was always faster to turn off HT (we did it at the kernel level and
> >>at the BIOS level) These results are also supported by the following:
> >>
> >>http://computational-battery.org/Maskinvare/Hyperthreading.html
> >
> >Apparently, those results were obtained with an early Xeon model with
> > small cache. Not that I dare to judge the issue, but results like this,
> > where simply turning on HT hurts performance, have sometimes been traced
> > to errors in the BIOS.
>
> Sorry, I didn't look too closely at the URL. However, our
> results were on a Xeon 2.4 cluster that was tuned (correct
> BIOS) we checked by running many benchmarks including
> the Stream benchmarks to look for memory setting errors.
> I haven't heard of problems in the BIOS causing the difference
> between results with and without HT. Do you have any
> references on this?
>
Those tests performed in the URL you gave look quite interesting. I'd like
to see continuing checks on the mm5 to see what can be done with up to date
configurations.
On the Xeon 2.4, bad HT performance has been seen when a BIOS older than the
production version was installed; with a production system, most benchmarks
should perform about the same (often 1% better) with HT enabled, when running
1 thread per physical CPU. If it's much worse than that, the BIOS should be
checked. In principle, no Xeon 2.4 production system should have such a
problem, unless someone has tinkered with the BIOS, or is not using an
appropriate OS kernel.
I'm not getting as good performance on Win2K, HT on, as with XP Pro, or linux
with a 2.4.18 kernel, but the only reason for running Win2K server on an MPI
cluster is to get past the limit of -np 8 for an all-XP system.
The older 1.7Ghz and 1.8Ghz Xeon models, which had no HT at first, were quite
a struggle when attempting to coordinate BIOS and CPU upgrades, and keep both
linux and Windows running. In fact, I had one expensive system board die on
me when running a BIOS and CPU combination not supported by the OEM. Hardly
anyone wants to document such problems with unsupported systems.
Enough of these implementation problems; I've taken Fortran MPI applications
which were built for linux and simply recompiled them for Windows, and it
worked, within the limitations on -np of the Windows OS and MPI chosen. All
the Windows dependencies can and should be hidden in the MPI, aside from the
annoying differences in linkage conventions (2 or 3 different underscoring
conventions in linux, and 2 linker symbol case conventions in Windows).
--
Tim Prince
|