Dear Darren,
(I was writing this email when Stephen Smith replied. The content is
essentially the same but I decided to send this email anyway)
I do not write program code for for FSL, but the explaination below is
from my experience writing program code, and if my assumptions (listed
below) are correct, it came as no surprise.
First, my assumption:
(0)The FSL team haven't written the code for parallel processing. This
is important, if they had parallelize the code, then the result is
really a surprise! (footnote: With reference to Smith's email, I think
they had not parallized the code)
(1)You are using the FSL program as it is, i.e., no attempt to
parallelize the code.
The actual answer to your question is complex, it depends on the design
of the supercomputer you are using.
To answer your question:
from line: 64 CPUs: 98.4% idle, 1.6% usr, 0.0% ker, 0.0% wait, 0.0%
xbrk,
and the breakdown table
Yes, it does looks like 62 CPU are idle, and either:
(a) one processor working frentically (film_gl, 98.4%) while another was
given the meagre and insignificant job (1.6%) or
(b)most probably, only one processor is working, mening the 63th
processor is also idle.
The litmus test to me is the processing time: Roughly speaking, for a
truly parallelized program, you will expect the processing time to be
about 1/62 of that with a single processor system. (Its not 1/64 because
of parallization overhead which I genereously taken to be equiavalent to
two processors full time.)
I suspect you find that the processing time is equiavlent to that
running on a single processor system with the same type of processor.
In this case, yes, using a 64 processor system does not speed up your
processing as only only the equivalent of one processor will be working
for your at any one time.
This is not alike the situation I have here: I have a twin processors
system, and for most programs, if I run only one instance of it, I
expect one processor to be sitting idlely.
The reason is that FSL is not written for parallel processing. (As a
matter of fact, neither is BAMM nor the vanilla favour of SPM). To
harness all processors on a single task, in most of the case, the
program code must have explicit instructions on how to do it. Most of
the time, this requires the programmer to explicit code the
parallization into the program. As parallel programs are not very easy
to write, and that parallel computers are not that common, most
programmers, including me, would not had bothered. It is the problem of
too much work, too little benefit.
A simple solution, exactly like what I do with my twin processors
system, is to actually push 64 film_gl process in parallel through the
supercomputer. In this case, I am pretty confident that that all 64
processors will be working frentically for you. Having said that, why
not just ask your colleagues to loan you their single processor
computers instead of booking time for a supercomputer?
There is another solution suggested to me by Liverpool University
Computing Services when I went up to MARIAC for a job interview. This is
applicable because FSL have a command line interface. It is more
difficult with SPM batch-mode but it is still possible. Assuming you
complete analysis requires the programs to be run in this sequence
ABCDEF, and each program will output data as files with unique name.
Then, it is a relatively simple task to put the sequence as a pipeline
by to have all programs listening for its input, and when all inputs
are available, execute the task. With the example sequence, initially
only program A will be processing dataset1, chunking out result1, as
soon as program A completes, program B will read and process result1,
and program A will start processing dataset2 and the process continues.
The idea is to allow one program occupies one CPU, and to achieve
parallel processing by processing the next dataset before the current
one is completed. However, to fully utilize 64 processors, you will need
64 programs. Also, as a pipeline process, the 6 processors in the
example will only kick in sequentially, meaning the speed of processing
will not be as fast as it would with a truely parallized program. This
means it may not be worthwhile programming this pipeline if you only
have 12 datasets, certainly not if you have less then 6 datasets.
Hope this helps,
Cinly
Darren Schreiber wrote:
> In an attempt to speed things up, I tried using FSL on a supercomputer
> we have on campus. Here is a view from "top":
>
>
> IRIX64 inire 6.5 IP35 load averages: 1.00 0.71 0.32 03:09:35
> 184 processes: 180 sleeping, 2 zombie, 2 running
> 64 CPUs: 98.4% idle, 1.6% usr, 0.0% ker, 0.0% wait, 0.0% xbrk,
> 0.0% intr
> Memory: 32G max, 31G avail, 20G free, 4096M swap, 4096M free swap
>
> PID PGRP USERNAME PRI SIZE RES STATE TIME WCPU% CPU%
> COMMAND
> 122536 122279 dschreib 20 122M 117M run/36 2:00 11.7
> 99.86 film_gl
> 122555 122555 dschreib 20 2288K 1344K run/32 0:00 0.4
> 0.83 top
>
>
> What I find interesting is that I am leaving the 64 processors 98%
> idle, while the CPU% used by film is 99.86%.
>
> Is this because FSL is working hard on one processor, but leaving the
> others inactive? Is there anything I can do here to speed things up?
>
> As it is, it looks like my happy little laptop can get the first level
> analyses done in about the same amount of time.
>
> Darren
>
> .
>
|