Hi all,
>> 4028 jobs; 0 idle, 4 running, 4024 held
I am surprised that the cleanup of those held jobs made a difference;
many CERN WMS nodes have had many more held jobs without problems.
> Maarten sent me this script wich must be in cron:
Actually I did not send _that_ exact code.
First, I suggested a different cron job in this bug:
https://savannah.cern.ch/bugs/?70401
That should avoid the steady increase of held jobs.
Second, the admin could run this script to clear _all_ held jobs:
-----------------------------------------------------------------------------
#!/bin/sh
condor=/opt/condor-c
condor_rm=$condor/bin/condor_rm
export CONDOR_CONFIG=$condor/etc/condor_config
tmp=/tmp/q-`date +%y%m%d-%H%M%S`-$$.txt
$condor/bin/condor_q > $tmp
for i in `
awk '
$6 == "H" && $2 == "glite" && NF == 9 && $1 ~ /^[0-9]+\.0$/ {
print $1
}
' "$tmp"
`
do
$condor_rm -forcex "$i" || {
$condor_rm "$i" && $condor_rm -forcex "$i"
}
done < /dev/null
-----------------------------------------------------------------------------
|