Hm, some things to check:
- if GLOBUS_PORTs are open in firewall (usually 20000-25000).
- In the home catalogue of the user you get mapped to (dteamsgm?) check
for the gram-error log file.
good luck,
Ilja
Adeel-ur-Rehman wrote:
> Dear Ilja,
>
> By running globus-job-run pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs -q dteam
> /bin/pwd, I get the prompt back without having any output.
>
>
> -- Best Regards --
> Adeel
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
> On Behalf Of Ilja Livenson
> Sent: Monday, August 27, 2007 4:20 PM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hi,
>
> I actually meant globus-job-run, not edg-job-submit.
>
> Perhaps you could run the following and post an output:
>
> globus-job-run pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs -q dteam /bin/pwd ?
>
> cheers,
> Ilja
>
>
>
> Adeel-ur-Rehman wrote:
>
>> Dear Ilja,
>>
>> The output of edg-job-submit I am getting is as follows:
>>
>> LOGGING INFORMATION:
>>
>> Printing info for the Job :
>> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>>
>> ---
>> Event: RegJob
>> - source = UserInterface
>> - timestamp = Mon Aug 27 10:09:48 2007
>> ---
>> Event: Transfer
>> - destination = NetworkServer
>> - result = START
>> - source = UserInterface
>> - timestamp = Mon Aug 27 10:09:50 2007
>> ---
>> Event: Transfer
>> - destination = NetworkServer
>> - result = OK
>> - source = UserInterface
>> - timestamp = Mon Aug 27 10:09:54 2007
>> ---
>> Event: Accepted
>> - source = NetworkServer
>> - timestamp = Mon Aug 27 10:09:52 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = NetworkServer
>> - timestamp = Mon Aug 27 10:09:54 2007
>> ---
>> Event: DeQueued
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:09:55 2007
>> ---
>> Event: Match
>> - dest_id =
>> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:10:00 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:10:01 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:10:02 2007
>> ---
>> Event: DeQueued
>> - source = JobController
>> - timestamp = Mon Aug 27 10:10:04 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = START
>> - source = JobController
>> - timestamp = Mon Aug 27 10:10:05 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = OK
>> - source = JobController
>> - timestamp = Mon Aug 27 10:10:06 2007
>> ---
>> Event: Accepted
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:10:14 2007
>> ---
>> Event: Transfer
>> - destination = LRMS
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:10:26 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:13:57 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:09 2007
>> ---
>> Event: Resubmission
>> - result = WILLRESUB
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:10 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:12 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:13 2007
>> ---
>> Event: DeQueued
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:14:14 2007
>> ---
>> Event: Match
>> - dest_id =
>> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:14:19 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:14:21 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:14:22 2007
>> ---
>> Event: DeQueued
>> - source = JobController
>> - timestamp = Mon Aug 27 10:14:23 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = START
>> - source = JobController
>> - timestamp = Mon Aug 27 10:14:25 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = OK
>> - source = JobController
>> - timestamp = Mon Aug 27 10:14:26 2007
>> ---
>> Event: Accepted
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:36 2007
>> ---
>> Event: Transfer
>> - destination = LRMS
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:14:48 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:17:23 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:17:35 2007
>> ---
>> Event: Resubmission
>> - result = WILLRESUB
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:17:36 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:17:38 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:17:39 2007
>> ---
>> Event: DeQueued
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:17:40 2007
>> ---
>> Event: Match
>> - dest_id =
>> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:17:46 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:17:47 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:17:48 2007
>> ---
>> Event: DeQueued
>> - source = JobController
>> - timestamp = Mon Aug 27 10:17:49 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = START
>> - source = JobController
>> - timestamp = Mon Aug 27 10:17:50 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = OK
>> - source = JobController
>> - timestamp = Mon Aug 27 10:17:51 2007
>> ---
>>
>> Event: Accepted
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:18:02 2007
>> ---
>> Event: Transfer
>> - destination = LRMS
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:18:14 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:21:44 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:21:56 2007
>> ---
>> Event: Resubmission
>> - result = WILLRESUB
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:21:57 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:21:59 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:22:00 2007
>> ---
>> Event: DeQueued
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:22:01 2007
>> ---
>> Event: Match
>> - dest_id =
>> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:22:07 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:22:08 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:22:09 2007
>> ---
>> Event: DeQueued
>> - source = JobController
>> - timestamp = Mon Aug 27 10:22:11 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = START
>> - source = JobController
>> - timestamp = Mon Aug 27 10:22:12 2007
>> ---
>> Event: Transfer
>> - destination = LogMonitor
>> - result = OK
>> - source = JobController
>> - timestamp = Mon Aug 27 10:22:13 2007
>> ---
>> Event: Accepted
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:22:23 2007
>> ---
>> Event: Transfer
>> - destination = LRMS
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:22:35 2007
>> ---
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:25:10 2007
>> ---
>>
>> Event: Done
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:25:22 2007
>> ---
>> Event: Resubmission
>> - result = WILLRESUB
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:25:23 2007
>> ---
>> Event: EnQueued
>> - result = START
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:25:25 2007
>> ---
>> Event: EnQueued
>> - result = OK
>> - source = LogMonitor
>> - timestamp = Mon Aug 27 10:25:26 2007
>> ---
>> Event: DeQueued
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:25:27 2007
>> ---
>> Event: Abort
>> - source = WorkloadManager
>> - timestamp = Mon Aug 27 10:25:28 2007
>>
>> **********************************************************************
>>
>> [pcncp21] ~ > edg-job-status
>> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>>
>>
>> *************************************************************
>> BOOKKEEPING INFORMATION:
>>
>> Status info for the Job :
>> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>> Current Status: Aborted
>> Status Reason: Job RetryCount (3) hit
>> Destination: pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>> reached on: Mon Aug 27 10:25:28 2007
>> *************************************************************
>>
>>
>> -- Best Regards --
>> Adeel
>>
>>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout
>>
> [mailto:[log in to unmask]]
>
>> On Behalf Of Ilja Livenson
>> Sent: Monday, August 27, 2007 10:50 AM
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>>
>> Well, in case of LCG-CE I think it's best to try running job with
>> globus-job-run. Perhaps you could post output of running it?
>>
>> atb,
>> Ilja
>>
>> Adeel-ur-Rehman wrote:
>>
>>
>>> Dear Ilja,
>>>
>>> Sorry for the confusion. By globus-job-run, I mean to say edg-job-submit!
>>>
>
>
>>> Yes I'm talking about LCG-CE.
>>>
>>> -- Best Regards --
>>> Adeel
>>>
>>> -----Original Message-----
>>> From: LHC Computer Grid - Rollout
>>>
>>>
>> [mailto:[log in to unmask]]
>>
>>
>>> On Behalf Of Ilja Livenson
>>> Sent: Saturday, August 25, 2007 8:29 PM
>>> To: [log in to unmask]
>>> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>>>
>>> Hi,
>>>
>>> are you sure you are talking about globus-job-run? It doesn't resubmit
>>> jobs, afaik, hence doesn't fail with the HitCount error.
>>>
>>> Ilja
>>>
>>> PS. You are talking about LCG CE, not gLite, right?
>>>
>>> Adeel-ur-Rehman wrote:
>>>
>>>
>>>
>>>> Dear All,
>>>>
>>>> At our site, since I upgraded it to the latest update of gLite 3.1, no
>>>>
>>>>
>>>>
>>> jobs are executing rather I am getting job submission failures. Reading
>>>
>>>
>> the
>>
>>
>>> details of the error, it states "Got a job held event, reason:
>>>
> Unspecified
>
>>> gridmanager error". I can qsub test jobs, but globus-job-run Aborts the
>>>
>>>
>> job
>>
>>
>>> after Retrying HitCount 3 times.
>>>
>>>
>>>
>>>>
>>>> And there is no offending ssh key problems between our CE and WNs.
>>>>
>>>> Any ideas??
>>>>
>>>>
>>>>
>>>> -- Best Regards --
>>>> Adeel-ur-Rehman
>>>>
>>>>
>>>>
>>>>
|