Winnie,
Strangely, we've been having a similar issue at L'pool lately. You can
"restart" the job with runjob, I think it's called. But it may then just
fail again. We've got around 48 jobs from dzero, in the same state; they
progress around through Q->R->W->Q ... When I look at the jobs with
qstat -f, I find that their X509_USER_PROXY variable points to a proxy
file that does not exist. That's why they fail. Whether this is related
to your problem, I can't say.
Steve
Winnie Lacesso wrote:
>> We see this problem sometimes with jobs whose proxy has expired before
>> they start running, so they start but can't do anything so get re-queued
>> in torque into the W state.
>>
>
> I don't know how to check what proxy one of these odd-state jobs has
> (advice welcome), but yes they queued a few days ago, for some reason the
> WN have been full of very long-running jobs so they had to wait.
>
> Job: 102403.lcgce03.phy.bris.ac.uk
> 06/14/2010 01:43:21 S enqueuing into medium, state 1 hop 1
> 06/14/2010 01:43:21 S Job Queued at request of [log in to unmask], owner =
> [log in to unmask], job name = STDIN, queue = medium
> 06/16/2010 01:25:53 S Job Modified at request of [log in to unmask]
> 06/16/2010 01:25:53 S Job Run at request of [log in to unmask]
> 06/16/2010 01:25:53 S MOM rejected modify request, error: 15001
> 06/16/2010 04:26:40 S Job Run at request of [log in to unmask]
> 06/16/2010 04:26:40 S MOM rejected modify request, error: 15001
> 06/16/2010 04:56:46 S Job Run at request of [log in to unmask]
> 06/16/2010 04:56:46 S MOM rejected modify request, error: 15001
>
> diagnose -j is saying things like
> WARNING: job '102403' has failed to start 11 times
>
>
>> Are the exec_host for the W jobs all the same?
>>
>
> tracejob doesn't show any exec_host for the W jobs at all - is there some
> other way to check? The pbs_server logs just log it as :Q:
>
> Very Grateful for Advice!
>
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|