Hi Admed,
Yes, for the time being if the proxy expires it results in JS status
of a site. I agree that it can be a problem when site is running a
lot of long production jobs. We could probably switch to using
MyProxy for SFT jobs, or detect this particular type of failure. I
will think of it if situation reoccurs.
Piotr
On Jul 21, 2005, at 11:49 AM, Ahmed Beriache wrote:
>
>
> Hi Piotr,
>
> We had a JS at CGG-LCG2 this morning while some production jobs are
> still running. When reading logging info of the monitoring failed
> job I found at the last two sections of the listing below that job
> proxy expired.
> Is it normal to make sites in JS status beacause of job proxy
> expired? Is there a way to keep the old status of the site when
> this kind of error hapen ?
>
> Cheers,
>
> Ahmed
>
>
>
>
>
> Event: Transfer
> - dest_host = ce1.egee.fr.cgg.com:2119/jobmanager-pbs
> - dest_instance = /var/edgwl/logmonitor/CondorG.log/
> CondorG.1121785008.log
> - dest_jobid = unavailable
> - destination = LRMS
> - host = lxn1177.cern.ch
> - level = SYSTEM
> - priority = asynchronous
> - reason = Job successfully submitted to Globus
> - result = OK
> - seqcode =
> UI=000003:NS=0000000003:WM=000004:BH=0000000000:JSS=000003:LM=000003:L
> RMS=000000:APP=000000
> - source = LogMonitor
> - src_instance = unique
> - timestamp = Wed Jul 20 07:45:51 2005
> - user = /C=CH/O=CERN/OU=GRID/CN=Piotr Nyczyk
> 9654
> - job = (queue=dteam)(jobtype=single)(environment=
> (EDG_WL_JOBID 'https://lxn1177.cern.ch:9000/cWBpRMg39mSMnNuFuG3NBQ'))
> ---
> Event: Running
> - host = lxn1177.cern.ch
> - level = SYSTEM
> - node = ce1.egee.fr.cgg.com
> - priority = asynchronous
> - seqcode =
> UI=000003:NS=0000000003:WM=000004:BH=0000000000:JSS=000003:LM=000005:L
> RMS=000000:APP=000000
> - source = LogMonitor
> - src_instance = unique
> - timestamp = Wed Jul 20 07:52:02 2005
> - user = /C=CH/O=CERN/OU=GRID/CN=Piotr Nyczyk
> 9654
> ---
> Event: Done
> - exit_code = 1
> - host = lxn1177.cern.ch
> - level = SYSTEM
> - priority = asynchronous
> - reason = Got a job held event, reason: Globus
> error 131: the user proxy expired (job is still running)
> - seqcode =
> UI=000003:NS=0000000003:WM=000004:BH=0000000000:JSS=000003:LM=000007:L
> RMS=000000:APP=000000
> - source = LogMonitor
> - src_instance = unique
> - status_code = FAILED
> - timestamp = Thu Jul 21 07:32:44 2005
> - user = /C=CH/O=CERN/OU=GRID/CN=Piotr Nyczyk
> 9654
> ---
> Event: Done
> - exit_code = 1
> - host = lxn1177.cern.ch
> - level = SYSTEM
> - priority = asynchronous
> - reason = Job got an error while in the
> CondorG queue.
> - seqcode =
> UI=000003:NS=0000000003:WM=000004:BH=0000000000:JSS=000003:LM=000009:L
> RMS=000000:APP=000000
> - source = LogMonitor
> - src_instance = unique
> - status_code = FAILED
> - timestamp = Thu Jul 21 07:32:56 2005
> - user = /C=CH/O=CERN/OU=GRID/CN=Piotr Nyczyk
> 9654
> ---
> Event: Abort
> - host = lxn1177.cern.ch
> - level = SYSTEM
> - priority = asynchronous
> - reason = Job proxy is expired.
> - seqcode =
> UI=000003:NS=0000000003:WM=000004:BH=0000000000:JSS=000003:LM=000010:L
> RMS=000000:APP=000000
> - source = LogMonitor
> - src_instance = unique
> - timestamp = Thu Jul 21 07:32:57 2005
> - user = /C=CH/O=CERN/OU=GRID/CN=Piotr Nyczyk
> 9654
>
>
|