> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Sam Skipsey
>
> 2009/8/7 RAUL H C LOPES <[log in to unmask]>:
> >>
> > interesting the decrease in efficiency even when there were cores
> > available.
> >
>
> Well, we knew this was the case anyway - Glasgow's initial data
> suggested something around the 3+ cores area.
> Remember, it's i/o contention in against the (single) hard disk in the
> WN which is the issue here - it looks like the large seeks used by
> each job within its data file interact very badly, resulting in the
> disk spending most of its time seeking (and the process spending most
> of its time waiting). Ewan had some observations at Oxford that backed
> this interpretation up, I believe.
>
Yup. Nothing prettily graphable, but just directly observing the nodes
and the jobs as they run with things like vmstat and strace you can see
it pretty clearly. This isn't a shock though - we know that root files
tend to result in a very jumpy access pattern, and any eight processes
doing intensive access to different files on a single bog standard disk
are going to make it spend loads of time seeking.
The question is what to do about it, and so far we seem to have:
- Don't run many of these jobs on one node,
- SSDs (ha; as if),
- Go back to direct rfio access to the SEs, but get it right this time.
And it's not entirely clear whether the last of those is actually possible.
Ewan
|