Hi,
Maarten Litmaath a écrit :
> Yannick Patois wrote:
>> # rpm -qa | grep -i condor
>> ncm-condorconfig-1.0.2-1
>> vdt_globus_jobmanager_condor-VDT1.2.2rh9_LCG-3
>> condor-lcg-1.1.0-1
>> condor-lcgrb-1.0.0-3
>> condor-6.7.10-1
>>
>> So I believe condor 6.7.10
> No. Ensure you have this in /opt:
> -------------------------------------------------------------------------------
>
> lrwxrwxrwx 1 root root 13 Feb 12 2007 condor ->
> condor-20.0.7
> -------------------------------------------------------------------------------
Fine, I do have it.
>> Something I did that seems to have "solved" the problem (for now, lets
>> hope), that I got from elsewhere:
>>
>> - Stopping the proxy-renewal daemon
>> - cd /opt/edg/var/spool/edg-wl-renewd
>> rm -f `ls | grep -E '*\.[0-9]+'`
> Beware that such an "rm" may screw up many jobs!
It did :(
> At CERN we have not needed to do that since a long time.
That might have been useless, what solved it, might well have been the
few random stop/start I did as you suggest below.
>> I also went through all daemons to see if some were stopped (some where)
>> and I restarted them. But unfortunately I didn't kept track of exactly
>> what I did...
> It seems the WM had crashed due to a double cancellation of the same job:
> as a side effect the proxy-renewal daemon can get into an infinite loop.
> In that case you need to keep stopping the PR and restarting both the PR
> and the WM until the WM has proceeded beyond the multiple cancellations:
> at each restart it advances by one.
I'll know that now. Thanks for your help.
Yannick
--
Yannick Patois <[log in to unmask]>
IPHC - IN2P3 / CNRS - 23 rue du Loess 67037 Strasbourg
Tel: 03 88 10 61 83
|