Hi Frank,
the following are some recommendation for increasing the processing
speed of XDS. You can find them (and add to them !) at
http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Performance .
Only item 7 is specific for a cluster.
In the order of effect:
1. XDS scales well (i.e. the wallclock time for data processing goes
down when the number of available cores is increased) in the COLSPOT,
IDXREF, INTEGRATE and CORRECT steps when using the
MAXIMUM_NUMBER_OF_PROCESSORS keyword. This triggers program-level
parallelization, using OpenMP threads.
2. the program scales very well in the COLSPOT and INTEGRATE steps
when using the MAXIMUM_NUMBER_OF_JOBS keyword. This triggers a
shell-level parallelization.
3. combining these both keywords gives the highest performance in my
experience (see [[1]] for an example). As a rough guide, I'd choose them
to be approximately equal; an even number for
MAXIMUM_NUMBER_OF_PROCESSORS should be chosen because that fits better
with usual hardware.
4. some overcommitting of resources (i.e.
MAXIMUM_NUMBER_OF_PROCESSORS * MAXIMUM_NUMBER_OF_JOBS > number of cores)
is beneficial; you'll have to play with these two parameters.
5. the next thing to consider is DELPHI together with
OSCILLATION_RANGE: if DELPHI is an integer multiple of
MAXIMUM_NUMBER_OF_PROCESSORS * OSCILLATION_RANGE that would be good
because it nicely balances the usage of the threads. For this purpose,
you may want to change (raise) the value of DELPHI (default is 5
degrees). If you are doing fine-slicing then mis-balancing of threads is
not an issue - but for those users who want to collect 1° frames (which
I think is not the best way nowadays ...) it should be a consideration.
6. performance-wise, I/O also plays a role because as soon as you
run 24 or so processes then a single GB ethernet connection may be
limiting. OTOH shell-level parallelization smoothes the load.
7. XDS with the MAXIMUM_NUMBER_OF_JOBS keyword can use several
machines. This requires some setup as explained at the bottom of
http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/downloading.html .
8. Hyperthreading (SMT), if available on Intel CPUs, is beneficial.
A "virtual" core has only about 20% performance of a "physical" core but
it comes at no cost - you just have to switch it on in the BIOS of the
machine.
9. The 64-bit binaries generally are a bit faster than the 32-bit
binaries (but that's not specific for XDS).
HTH,
Kay
On 04/19/2011 02:06 PM, Frank Murphy wrote:
> Dear All,
>
> Here at NE-CAT, we make extensive use of XDS in a parallel environment. We are looking to purchase some new hardware, so I am soliciting your opinions.
>
> Our current cluster is made up of 16 nodes, each with 2 processors that have four cores, running at 2.2 GHz (I believe). We run with hyperthreading on, so 8 physical and 16 virtual cores per node.
>
> Our benchmarking with XDS (see https://rapd.nec.aps.anl.gov/wiki/RAPD_NecatStats for an example) shows a diminishing return on increasing the MAXIMUM_NUMBER_OF_PROCESSORS beyond the number of physical cores, and we are wondering if this is due to the test, the processor, the RAM, or XDS. In short, will going to 2 six core processors speed up processing using up to 12 for MAXIMUM_NUMBER_OF_PROCESSORS?
>
> Please do not feel the need to constrain the discussion to XDS, as we use our cluster for pretty much all the common crystallographic tasks.
>
> Thanks in advance,
>
> Frank Murphy
> Beamline Scientist, NE-CAT
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de
email: [log in to unmask] Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz
This e-mail is digitally signed. If your e-mail client does not have the
necessary capabilities, just ignore the attached signature "smime.p7s".
|