Dear All,
At QMUL we have a load problem with the experimental shared area.
The farm is running around 900 jobs and the nfs server serving the
experimental area is overloaded.
The result of that is that lhcb jobs sits for a long time on the wn
waiting for data (mainly libraries).
We would like to know how this is solved at ral, manchester where the
size is similar. We where thinking of setting up a set of pbs slots for
the sgm to have rw access. The other nodes would just have a copy on the
local disk or access through several nfs servers.
I think the problem with the small set of wn having rw access is that
lhcb is sending a lot of jobs via one user who is sgm. Most of those
jobs do not write to the experimental software area but they would stack
to wait for the wn to be freed.
We are keen to have your experience on that topic.
Cheers, Olivier.
--
- O. van der Aa - Imperial College London -
- LT2 Technical Coordinator -
- tel: +442075947810, +442071005426 -
- SIP: [log in to unmask] -
- fax: +442078238830 -
- http://surl.se/agtu -
|