Dear Ilja,
By running globus-job-run pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs -q dteam
/bin/pwd, I get the prompt back without having any output.
-- Best Regards --
Adeel
-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
On Behalf Of Ilja Livenson
Sent: Monday, August 27, 2007 4:20 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job Submission Failure
Hi,
I actually meant globus-job-run, not edg-job-submit.
Perhaps you could run the following and post an output:
globus-job-run pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs -q dteam /bin/pwd ?
cheers,
Ilja
Adeel-ur-Rehman wrote:
> Dear Ilja,
>
> The output of edg-job-submit I am getting is as follows:
>
> LOGGING INFORMATION:
>
> Printing info for the Job :
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>
> ---
> Event: RegJob
> - source = UserInterface
> - timestamp = Mon Aug 27 10:09:48 2007
> ---
> Event: Transfer
> - destination = NetworkServer
> - result = START
> - source = UserInterface
> - timestamp = Mon Aug 27 10:09:50 2007
> ---
> Event: Transfer
> - destination = NetworkServer
> - result = OK
> - source = UserInterface
> - timestamp = Mon Aug 27 10:09:54 2007
> ---
> Event: Accepted
> - source = NetworkServer
> - timestamp = Mon Aug 27 10:09:52 2007
> ---
> Event: EnQueued
> - result = OK
> - source = NetworkServer
> - timestamp = Mon Aug 27 10:09:54 2007
> ---
> Event: DeQueued
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:09:55 2007
> ---
> Event: Match
> - dest_id =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:10:00 2007
> ---
> Event: EnQueued
> - result = START
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:10:01 2007
> ---
> Event: EnQueued
> - result = OK
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:10:02 2007
> ---
> Event: DeQueued
> - source = JobController
> - timestamp = Mon Aug 27 10:10:04 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = START
> - source = JobController
> - timestamp = Mon Aug 27 10:10:05 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = OK
> - source = JobController
> - timestamp = Mon Aug 27 10:10:06 2007
> ---
> Event: Accepted
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:10:14 2007
> ---
> Event: Transfer
> - destination = LRMS
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:10:26 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:13:57 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:09 2007
> ---
> Event: Resubmission
> - result = WILLRESUB
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:10 2007
> ---
> Event: EnQueued
> - result = START
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:12 2007
> ---
> Event: EnQueued
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:13 2007
> ---
> Event: DeQueued
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:14:14 2007
> ---
> Event: Match
> - dest_id =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:14:19 2007
> ---
> Event: EnQueued
> - result = START
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:14:21 2007
> ---
> Event: EnQueued
> - result = OK
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:14:22 2007
> ---
> Event: DeQueued
> - source = JobController
> - timestamp = Mon Aug 27 10:14:23 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = START
> - source = JobController
> - timestamp = Mon Aug 27 10:14:25 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = OK
> - source = JobController
> - timestamp = Mon Aug 27 10:14:26 2007
> ---
> Event: Accepted
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:36 2007
> ---
> Event: Transfer
> - destination = LRMS
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:14:48 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:17:23 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:17:35 2007
> ---
> Event: Resubmission
> - result = WILLRESUB
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:17:36 2007
> ---
> Event: EnQueued
> - result = START
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:17:38 2007
> ---
> Event: EnQueued
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:17:39 2007
> ---
> Event: DeQueued
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:17:40 2007
> ---
> Event: Match
> - dest_id =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:17:46 2007
> ---
> Event: EnQueued
> - result = START
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:17:47 2007
> ---
> Event: EnQueued
> - result = OK
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:17:48 2007
> ---
> Event: DeQueued
> - source = JobController
> - timestamp = Mon Aug 27 10:17:49 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = START
> - source = JobController
> - timestamp = Mon Aug 27 10:17:50 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = OK
> - source = JobController
> - timestamp = Mon Aug 27 10:17:51 2007
> ---
>
> Event: Accepted
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:18:02 2007
> ---
> Event: Transfer
> - destination = LRMS
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:18:14 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:21:44 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:21:56 2007
> ---
> Event: Resubmission
> - result = WILLRESUB
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:21:57 2007
> ---
> Event: EnQueued
> - result = START
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:21:59 2007
> ---
> Event: EnQueued
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:22:00 2007
> ---
> Event: DeQueued
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:22:01 2007
> ---
> Event: Match
> - dest_id =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:22:07 2007
> ---
> Event: EnQueued
> - result = START
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:22:08 2007
> ---
> Event: EnQueued
> - result = OK
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:22:09 2007
> ---
> Event: DeQueued
> - source = JobController
> - timestamp = Mon Aug 27 10:22:11 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = START
> - source = JobController
> - timestamp = Mon Aug 27 10:22:12 2007
> ---
> Event: Transfer
> - destination = LogMonitor
> - result = OK
> - source = JobController
> - timestamp = Mon Aug 27 10:22:13 2007
> ---
> Event: Accepted
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:22:23 2007
> ---
> Event: Transfer
> - destination = LRMS
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:22:35 2007
> ---
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:25:10 2007
> ---
>
> Event: Done
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:25:22 2007
> ---
> Event: Resubmission
> - result = WILLRESUB
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:25:23 2007
> ---
> Event: EnQueued
> - result = START
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:25:25 2007
> ---
> Event: EnQueued
> - result = OK
> - source = LogMonitor
> - timestamp = Mon Aug 27 10:25:26 2007
> ---
> Event: DeQueued
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:25:27 2007
> ---
> Event: Abort
> - source = WorkloadManager
> - timestamp = Mon Aug 27 10:25:28 2007
>
> **********************************************************************
>
> [pcncp21] ~ > edg-job-status
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>
>
> *************************************************************
> BOOKKEEPING INFORMATION:
>
> Status info for the Job :
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> Current Status: Aborted
> Status Reason: Job RetryCount (3) hit
> Destination: pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> reached on: Mon Aug 27 10:25:28 2007
> *************************************************************
>
>
> -- Best Regards --
> Adeel
>
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout
[mailto:[log in to unmask]]
> On Behalf Of Ilja Livenson
> Sent: Monday, August 27, 2007 10:50 AM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Well, in case of LCG-CE I think it's best to try running job with
> globus-job-run. Perhaps you could post output of running it?
>
> atb,
> Ilja
>
> Adeel-ur-Rehman wrote:
>
>> Dear Ilja,
>>
>> Sorry for the confusion. By globus-job-run, I mean to say edg-job-submit!
>> Yes I'm talking about LCG-CE.
>>
>> -- Best Regards --
>> Adeel
>>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout
>>
> [mailto:[log in to unmask]]
>
>> On Behalf Of Ilja Livenson
>> Sent: Saturday, August 25, 2007 8:29 PM
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>>
>> Hi,
>>
>> are you sure you are talking about globus-job-run? It doesn't resubmit
>> jobs, afaik, hence doesn't fail with the HitCount error.
>>
>> Ilja
>>
>> PS. You are talking about LCG CE, not gLite, right?
>>
>> Adeel-ur-Rehman wrote:
>>
>>
>>> Dear All,
>>>
>>> At our site, since I upgraded it to the latest update of gLite 3.1, no
>>>
>>>
>> jobs are executing rather I am getting job submission failures. Reading
>>
> the
>
>> details of the error, it states "Got a job held event, reason:
Unspecified
>> gridmanager error". I can qsub test jobs, but globus-job-run Aborts the
>>
> job
>
>> after Retrying HitCount 3 times.
>>
>>
>>>
>>> And there is no offending ssh key problems between our CE and WNs.
>>>
>>> Any ideas??
>>>
>>>
>>>
>>> -- Best Regards --
>>> Adeel-ur-Rehman
>>>
>>>
>>>
|