Print

Print


Hi,

I actually meant  globus-job-run, not edg-job-submit.

Perhaps you could run the following and post an output:

globus-job-run pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs -q dteam /bin/pwd ?

cheers,
Ilja



Adeel-ur-Rehman wrote:
> Dear Ilja,
>
> The output of edg-job-submit I am getting is as follows:
>
> LOGGING INFORMATION:
>
> Printing info for the Job :
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>
>         ---
>  Event: RegJob
> - source                  =    UserInterface
> - timestamp               =    Mon Aug 27 10:09:48 2007
>         ---
>  Event: Transfer
> - destination             =    NetworkServer
> - result                  =    START
> - source                  =    UserInterface
> - timestamp               =    Mon Aug 27 10:09:50 2007
>         ---
>  Event: Transfer
> - destination             =    NetworkServer
> - result                  =    OK
> - source                  =    UserInterface
> - timestamp               =    Mon Aug 27 10:09:54 2007
>         ---
>  Event: Accepted
> - source                  =    NetworkServer
> - timestamp               =    Mon Aug 27 10:09:52 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    NetworkServer
> - timestamp               =    Mon Aug 27 10:09:54 2007
>         ---
>  Event: DeQueued
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:09:55 2007
>         ---
>  Event: Match
> - dest_id                 =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:10:00 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:10:01 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:10:02 2007
>         ---
>  Event: DeQueued
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:10:04 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    START
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:10:05 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    OK
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:10:06 2007
>         ---
>  Event: Accepted
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:10:14 2007
>         ---
>  Event: Transfer
> - destination             =    LRMS
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:10:26 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:13:57 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:09 2007
>         ---
>  Event: Resubmission
> - result                  =    WILLRESUB
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:10 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:12 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:13 2007
>         ---
>  Event: DeQueued
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:14:14 2007
>         ---
>  Event: Match
> - dest_id                 =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:14:19 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:14:21 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:14:22 2007
>         ---
>  Event: DeQueued
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:14:23 2007
>         ---
> Event: Transfer
> - destination             =    LogMonitor
> - result                  =    START
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:14:25 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    OK
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:14:26 2007
>         ---
>  Event: Accepted
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:36 2007
>         ---
>  Event: Transfer
> - destination             =    LRMS
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:14:48 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:17:23 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:17:35 2007
>         ---
>  Event: Resubmission
> - result                  =    WILLRESUB
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:17:36 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:17:38 2007
>         ---
> Event: EnQueued
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:17:39 2007
>         ---
>  Event: DeQueued
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:17:40 2007
>         ---
>  Event: Match
> - dest_id                 =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:17:46 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:17:47 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:17:48 2007
>         ---
>  Event: DeQueued
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:17:49 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    START
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:17:50 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    OK
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:17:51 2007
>         ---
>
> Event: Accepted
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:18:02 2007
>         ---
>  Event: Transfer
> - destination             =    LRMS
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:18:14 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:21:44 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:21:56 2007
>         ---
>  Event: Resubmission
> - result                  =    WILLRESUB
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:21:57 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:21:59 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:22:00 2007
>         ---
>  Event: DeQueued
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:22:01 2007
>         ---
>  Event: Match
> - dest_id                 =
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:22:07 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:22:08 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:22:09 2007
>         ---
>  Event: DeQueued
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:22:11 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    START
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:22:12 2007
>         ---
>  Event: Transfer
> - destination             =    LogMonitor
> - result                  =    OK
> - source                  =    JobController
> - timestamp               =    Mon Aug 27 10:22:13 2007
>         ---
>  Event: Accepted
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:22:23 2007
>         ---
>  Event: Transfer
> - destination             =    LRMS
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:22:35 2007
>         ---
>  Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:25:10 2007
>         ---
>
> Event: Done
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:25:22 2007
>         ---
>  Event: Resubmission
> - result                  =    WILLRESUB
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:25:23 2007
>         ---
>  Event: EnQueued
> - result                  =    START
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:25:25 2007
>         ---
>  Event: EnQueued
> - result                  =    OK
> - source                  =    LogMonitor
> - timestamp               =    Mon Aug 27 10:25:26 2007
>         ---
>  Event: DeQueued
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:25:27 2007
>         ---
>  Event: Abort
> - source                  =    WorkloadManager
> - timestamp               =    Mon Aug 27 10:25:28 2007
>
> **********************************************************************
>
> [pcncp21] ~ > edg-job-status
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>
>
> *************************************************************
> BOOKKEEPING INFORMATION:
>
> Status info for the Job :
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> Current Status:     Aborted
> Status Reason:      Job RetryCount (3) hit
> Destination:        pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> reached on:         Mon Aug 27 10:25:28 2007
> *************************************************************
>
>
> -- Best Regards --
> Adeel
>
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
> On Behalf Of Ilja Livenson
> Sent: Monday, August 27, 2007 10:50 AM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Well, in case of LCG-CE I think it's best to try running job with 
> globus-job-run. Perhaps you could post output of running it?
>
> atb,
> Ilja
>
> Adeel-ur-Rehman wrote:
>   
>> Dear Ilja, 
>>
>> Sorry for the confusion. By globus-job-run, I mean to say edg-job-submit! 
>> Yes I'm talking about LCG-CE.
>>
>> -- Best Regards --
>> Adeel
>>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout
>>     
> [mailto:[log in to unmask]]
>   
>> On Behalf Of Ilja Livenson
>> Sent: Saturday, August 25, 2007 8:29 PM
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>>
>> Hi,
>>
>> are you sure you are talking about globus-job-run? It doesn't resubmit 
>> jobs, afaik, hence doesn't fail with the HitCount error.
>>
>> Ilja
>>
>> PS. You are talking about LCG CE, not gLite, right?
>>
>> Adeel-ur-Rehman wrote:
>>   
>>     
>>> Dear All,
>>>
>>> At our site, since I upgraded it to the latest update of gLite 3.1, no
>>>     
>>>       
>> jobs are executing rather I am getting job submission failures. Reading
>>     
> the
>   
>> details of the error, it states "Got a job held event, reason: Unspecified
>> gridmanager error". I can qsub test jobs, but globus-job-run Aborts the
>>     
> job
>   
>> after Retrying HitCount 3 times.
>>   
>>     
>>>  
>>> And there is no offending ssh key problems between our CE and WNs.
>>>  
>>> Any ideas??
>>>  
>>>  
>>>  
>>> -- Best Regards --
>>> Adeel-ur-Rehman
>>>
>>>     
>>>