>
> >Hi,
> >
> > I've not seen this for a while where all jobs submitted to RB
> > remain in the "Waiting" state for ever.
> >
> > It had apparently gone a away with recent version of resource broker
> > code.
> >
> > I've restarted every service there making sure they really are dead but
> > have had no joy.
>
> Jobs are waiting when the Workload Manager has not got to them yet,
> meaning they still sit in /var/edgwl/workload_manager/input.fl.
>
> In the past this could happen when there was a deadlock on the file,
> which is also used by the Network Server; check with:
>
> cat /proc/locks
I learnt a new command for this last week.
# /usr/sbin/lslk
SRC PID DEV INUM SZ TY M ST WH END LEN NAME
(unknown) 3305 3,2 2093535 w 0 0 0 0 0 / (rootfs)
condor_master 10075 3,2 2077096 0 w 0 0 0 0 0 /opt/condor/var/condor/log/InstanceLock
atd 20463 3,2 2093537 6 w 0 0 0 0 0 /var/run/atd.pid
though it's man page amused me.
>
> Recently we have seen that the matchmaking can become very slow,
> due to the BDII having a slapd cache size that is too small
> (fixed in LCG-2_3_1): can you check that setting?
I fixed that one up. It is running fine.
>
> Does /var/edgwl/workload_manager/log/events.log show activity?
My suspicion is that I fixed it though I lost some jobs and any
evidence of the problem.
Thanks Steve
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|