Hello!
If you're using lcgcondor or condor as the LRMS, please analyze the
queue on the schedd machine ( usually CE ), as follows:
condor_q
displays all the queue. choose one of the names as you job ID",
condor_q -analyze <job ID>
will show you more details, or
condor_q -better-analyze <job ID>
will show you the exact reason for what's going on.
If you have any further questions about condor, I will be happy to help.
Max.
Olivier van der Aa wrote:
> Dear All,
>
> We have a case here at Imperial where the jobs gets stuck a long time in
> waiting state.
> Our rb is gfe01.hep.ph.ic.ac.uk. If I look in the events table of the rb
> for a given job I get this:
>
> select prog,arrived from events where jobid="mjzBSsixc2QHmKyGZvjKaw"
> +-----------------+---------------------+
> | UserInterface | 2007-02-14 10:07:38 |
> | UserInterface | 2007-02-14 10:07:41 |
> | NetworkServer | 2007-02-14 10:07:46 |
> | UserInterface | 2007-02-14 10:07:51 |
> | NetworkServer | 2007-02-14 10:07:51 |
> | WorkloadManager | 2007-02-14 11:45:54 |
> | WorkloadManager | 2007-02-14 11:47:18 |
> | WorkloadManager | 2007-02-14 11:47:19 |
> | WorkloadManager | 2007-02-14 11:47:20 |
> | JobController | 2007-02-14 11:47:22 |
> | JobController | 2007-02-14 11:47:24 |
> | JobController | 2007-02-14 11:47:26 |
> | LogMonitor | 2007-02-14 12:45:57 |
> | LogMonitor | 2007-02-14 12:46:00 |
> | LogMonitor | 2007-02-14 13:00:31 |
> | LogMonitor | 2007-02-14 13:00:32 |
> | LogMonitor | 2007-02-14 13:00:33 |
> | LogMonitor | 2007-02-14 13:00:35 |
> +-----------------+---------------------+
>
> Clearly the NetworkServer accepted my request at 10h07 and the workload
> manager only received the request at 11h45 !
> What could be the cause of such a long delay.
>
> I observe that they are quite a lot of files in
> /var/edgwl/workload_manager like
> input.fl.1171462935.27646.wrong containing stack traces...
>
> Does it mean that the workload manager is crashing ?
>
>
> Cheers, Olivier.
> --- O. van der Aa - Imperial College London -
> - LT2 Technical Coordinator -
> - tel: +442075947810, -
> - SIP: [log in to unmask] -
> - fax: +442078238830 -
> - http://surl.se/agtu -
|