Charles Loomis wrote:
> Hello,
>
> The logmonitor and workload_manager processes on the grid09.lal.in2p3.fr
> RB are down and won't restart. The fatal error from the logmonitor is:
>
> 08 Dec, 09:15:46 -F- MonitorLoop::run(): Cannot run the CondorMonitor,
> fatal error.
> 08 Dec, 09:15:46 -F- MonitorLoop::run(): Error while running logmonitor:
> "Position not currently available "/var/edgwl/workload_manager/input.fl"
> (_file_sequence_t::insertData(...)[21])".
>
> and from the workload manager is something similar:
>
> 08 Dec, 09:14:32 -E: [Error] run(DispatcherFromFileList.cpp:164):
> Dispatcher: Position not currently available
> "/var/edgwl/workload_manager/input.fl"
> (_file_sequence_t::getBegin()[187]). Exiting...
>
> In the directory there are lots of files with names like:
> input.fl.1101621803.4201.wrong which contains an error like:
>
> Last known (wrong) status was: Position not currently available (6)
Please tar up all of them and send the tar ball to me.
> I've not seen these errors before and couldn't find anything similar in
> the wiki.
>
> Does anyone have an idea what the problem is and a fix?
Remove /var/edgwl/workload_manager/input.fl and restart the services;
you will lose all the jobs described in the "input.fl".
|