Hi all,
We at LIP-Coimbra are managing a hybrid cluster (local+grid usage). At the moment one
of our local users has been trying to submit a large number of jobs (>3000) and the
queuing system grinds to a halt with all the consequences that has. We have had from
extremely slow response times on all commands (showq, qstat, qsub, etc) to torque
and/or maui crashing.
We have:
torque-2.3.0-snap.200801151629.2cri.slc4
maui-server-3.2.6p20-snap.1182974819.8.slc4
maui-3.2.6p20-snap.1182974819.8.slc4
maui-client-3.2.6p20-snap.1182974819.8.slc4
torque-server-2.3.0-snap.200801151629.2cri.slc4
torque-client-2.3.0-snap.200801151629.2cri.slc4
on a dedicated machine (separate from the CE).
Is this normal for lcg-pbs? Is there something we could change? This is certainly not
normal/usual on our other clusters with stock torque/maui...
Cheers,
Miguel Afonso Oliveira
|