Hi Mike,
> rbwmsmon claims that our WMSs have around the same number of jobs in input.fl:
> svr022 = 53
> svr023 = 33
>
> yet the size of the input.fl files on disk is vastly different:
>
> compare
>
> svr022:/var/glite/workload_manager# ls -sh
> total 394M
> 394M input.fl 164K input.fl.log 4.0K ismdump.fl
>
> with
>
> svr023:/var/glite/workload_manager# ls -sh
> total 21M
> 21M input.fl 396K input.fl.log 4.0K ismdump.fl
>
> does input.fl get regularly flushed by some process? It only ever
> appears to increase in size on svr022.
The input.fl file has a fundamental problem: it only gets cleared when
_all_ its jobs have been processed by the WM, whereas often there are
problematic jobs that need multiple matchmaking attempts or get stuck
because of some bug in the WM.
That is why the latest WMS release in production no longer uses the
input.fl but a job directory instead, in which each job is a file.
The small numbers you see reported currently are found as follows:
grep -c ' g$' /var/glite/workload_manager/input.fl
|