Gonçalo Borges wrote:
> Hi,
>
> rb02.lip.pt is experiencing some problems. A user is flooding the
> machine completely with continuous jobs submissions, and the average
> load of the machine is around 20.
Is that actually a problem? Some of our RBs often have high loads:
http://lxb2007.cern.ch/monitoring/monitoring.html
> The edg-wl-workload daemon is consuming a lot of CPU and making other
> users (and our) lifes difficult
>
> [root@rb02 root]# /opt/condor/bin/condor_q
> (...)
> 1406 jobs; 292 idle, 997 running, 117 held
>
>
> [root@rb02 root]# /opt/condor/bin/condor_q -long | grep -i
> UserSubjectName | grep Carrillo | wc -l
> 1145
>
> The user is not doing anything wrong... He is just submiting jobs from
> his UI where our RB is configured... So, I can not simply ask him not to
> do it... The middleware should take care of such situations allowing to
> distribute the load by several RBs and not just one... Nevertheless,
The middleware does allow that, but the user may still bombard your RB...
> being practical, is there something I can do from the RB point of view,
> some optimization or something else?!
Consider setting up an extra RB. The UI can be manually configured to
submit to a set of RBs, picking a random choice per job.
|