On Sat, 14 Jul 2007, Kyriakos Ginis wrote:
> I temporarily disabled the queue for the SEE VO (the region VO) a few
> minutes after sending the mail, because the problem was caused by a
> local user. Apparently this user, runs a Monte Carlo simulation and does
> a mass job submission, which at some moments creates a very high load on
> the CEs.
Mass _submission_ through an RB or WMS should not be a problem.
Mass _cancellation_ or _cleanup_ is a problem, and that is what
seems to have happened:
> [...]
> seeXXX 24654 24399 0 22:25 ? 00:00:00 /usr/bin/perl
> /opt/globus/libexec/globus-job-manager-script.pl -m pbs -f
> /tmp/gram_cPOePN -c remove_scratchdir
> seeXXX 24655 24401 0 22:25 ? 00:00:00 /usr/bin/perl
> /opt/globus/libexec/globus-job-manager-script.pl -m pbs -f
> /tmp/gram_dBTaxf -c remove_scratchdir
> seeXXX 24656 24400 0 22:25 ? 00:00:00 /usr/bin/perl
> /opt/globus/libexec/globus-job-manager-script.pl -m pbs -f
> /tmp/gram_Ltw170 -c remove_scratchdir
> seeXXX 24657 24411 0 22:25 ? 00:00:00 /usr/bin/perl
> /opt/globus/libexec/globus-job-manager-script.pl -m pbs -f
> /tmp/gram_dvGF5t -c remove_scratchdir
> [...]
Only the gLite 3.1 WMS has a version of Condor-G that avoids flooding
the CE with such requests. Its release is expected in a few weeks.
In the meantime one occasionally needs to reboot the CE or temporarily
block some users...
|