Yannick Patois wrote:
> # rpm -qa | grep -i condor
> ncm-condorconfig-1.0.2-1
> vdt_globus_jobmanager_condor-VDT1.2.2rh9_LCG-3
> condor-lcg-1.1.0-1
> condor-lcgrb-1.0.0-3
> condor-6.7.10-1
>
>
> So I believe condor 6.7.10
No. Ensure you have this in /opt:
-------------------------------------------------------------------------------
lrwxrwxrwx 1 root root 13 Feb 12 2007 condor -> condor-20.0.7
-------------------------------------------------------------------------------
> Something I did that seems to have "solved" the problem (for now, lets
> hope), that I got from elsewhere:
>
> - Stopping the proxy-renewal daemon
> - cd /opt/edg/var/spool/edg-wl-renewd
> rm -f `ls | grep -E '*\.[0-9]+'`
Beware that such an "rm" may screw up many jobs!
At CERN we have not needed to do that since a long time.
> - Starting the daemon again.
>
> Dont know why, but it seems to help.
>
>
> I also went through all daemons to see if some were stopped (some where)
> and I restarted them. But unfortunately I didn't kept track of exactly
> what I did...
It seems the WM had crashed due to a double cancellation of the same job:
as a side effect the proxy-renewal daemon can get into an infinite loop.
In that case you need to keep stopping the PR and restarting both the PR
and the WM until the WM has proceeded beyond the multiple cancellations:
at each restart it advances by one.
|