Dear Maarten,
I'm pretty sure the job have finished. I checked at the WN, the job
has been cleared as well.
Furthermore, the job was also not longer at the batch system job
queue. There are two sites
that are affected by this problem. One running SGE and another one
running TORQUE.
Here is the output from both glite-wms-job-status and -logging-info :
======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:
Status info for the Job : https://lb.biruni.upm.my:9000/_5vhGkOpkw6b3uIB8mOOZw
Current Status: Running
Status Reason: unavailable
Destination: ce.utmgrid.utm.my:8443/cream-pbs-hpc
Submitted: Fri Oct 10 00:39:54 2014 UTC
=========================================================================
$ glite-wms-job-logging-info -v 2 -i jid
===================== glite-wms-job-logging-info Success =====================
LOGGING INFORMATION:
Printing info for the Job : https://lb.biruni.upm.my:9000/_5vhGkOpkw6b3uIB8mOOZw
---
Event: RegJob
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Jobtype = SIMPLE
- Ns =
https://wms.biruni.upm.my:7443/glite_wms_wmproxy_server
- Nsubjobs = 0
- Source = NetworkServer
- Src instance =
https://wms.biruni.upm.my:7443/glite_wms_wmproxy_server
- Timestamp = Fri Oct 10 00:39:54 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: Accepted
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- From = UserInterface
- From host = NetworkServer
- From instance = ui.biruni.upm.my
- Host = wms.biruni.upm.my
- Source = NetworkServer
- Src instance =
https://wms.biruni.upm.my:7443/glite_wms_wmproxy_server
- Timestamp = Fri Oct 10 00:39:54 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: EnQueued
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Queue = /var/workload_manager/jobdir
- Result = START
- Source = NetworkServer
- Src instance =
https://wms.biruni.upm.my:7443/glite_wms_wmproxy_server
- Timestamp = Fri Oct 10 00:39:54 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: EnQueued
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Queue = /var/workload_manager/jobdir
- Result = OK
- Source = NetworkServer
- Src instance =
https://wms.biruni.upm.my:7443/glite_wms_wmproxy_server
- Timestamp = Fri Oct 10 00:39:54 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: DeQueued
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Queue = /var/workload_manager/jobdir
- Source = WorkloadManager
- Src instance = 30467
- Timestamp = Fri Oct 10 00:39:55 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
---
Event: Match
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Dest id = ce.utmgrid.utm.my:8443/cream-pbs-hpc
- Host = wms.biruni.upm.my
- Source = WorkloadManager
- Src instance = 30467
- Timestamp = Fri Oct 10 00:39:55 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
---
Event: UserTag
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Name = CEInfoHostName
- Source = WorkloadManager
- Src instance = 30467
- Timestamp = Fri Oct 10 00:39:55 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
- Value = ce.utmgrid.utm.my
---
Event: EnQueued
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Queue = /var/ice/jobdir
- Result = START
- Source = WorkloadManager
- Src instance = 30467
- Timestamp = Fri Oct 10 00:39:55 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
---
Event: EnQueued
- Arrived = Fri Oct 10 00:40:09 2014 UTC
- Host = wms.biruni.upm.my
- Queue = /var/ice/jobdir
- Result = OK
- Source = WorkloadManager
- Src instance = 30467
- Timestamp = Fri Oct 10 00:39:55 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
---
Event: DeQueued
- Arrived = Fri Oct 10 00:40:10 2014 UTC
- Host = wms.biruni.upm.my
- Local jobid =
https://lb.biruni.upm.my:9000/_5vhGkOpkw6b3uIB8mOOZw
- Queue = /var/ice/jobdir
- Source = JobController
- Timestamp = Fri Oct 10 00:39:56 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Transfer
- Arrived = Fri Oct 10 00:40:10 2014 UTC
- Dest host =
https://ce.utmgrid.utm.my:8443/ce-cream/services/CREAM2
- Dest instance = unavailable
- Dest jobid = unavailable
- Destination = LRMS
- Host = wms.biruni.upm.my
- Reason = unavailable
- Result = START
- Source = LogMonitor
- Timestamp = Fri Oct 10 00:39:56 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Running
- Arrived = Fri Oct 10 00:40:35 2014 UTC
- Host = wn001.utmgrid.utm.my
- Node = wn001.utmgrid.utm.my
- Source = LRMS
- Timestamp = Fri Oct 10 00:40:10 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: ReallyRunning
- Arrived = Fri Oct 10 00:40:35 2014 UTC
- Host = wn001.utmgrid.utm.my
- Source = LRMS
- Timestamp = Fri Oct 10 00:40:12 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: Done
- Arrived = Fri Oct 10 00:40:38 2014 UTC
- Exit code = 516838320
- Host = wn001.utmgrid.utm.my
- Reason = job completed
- Source = LRMS
- Status code = OK
- Timestamp = Fri Oct 10 00:40:14 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711
---
Event: Transfer
- Arrived = Fri Oct 10 00:40:23 2014 UTC
- Dest host =
https://ce.utmgrid.utm.my:8443/ce-cream/services/CREAM2
- Dest instance = unavailable
- Dest jobid = https://ce.utmgrid.utm.my:8443/CREAM296090834
- Destination = LRMS
- Host = wms.biruni.upm.my
- Reason = unavailable
- Result = OK
- Source = LogMonitor
- Timestamp = Fri Oct 10 00:40:09 2014 UTC
- User = /C=TW/O=AP/OU=GRID/CN=Muhammad
Farhan Sjaugi 154711/CN=proxy/CN=proxy
=========================================================================
I did another test by submitting the same job thru another random wms
server at dteam VO, it worked well.
Regards
On Fri, Oct 10, 2014 at 3:45 AM, <[log in to unmask]> wrote:
> Hi Muhammad,
>
>> Actually the job is a simple job, executing the /bin/hostname command
>> at the worker node to check the
>
> Are you sure the WMS job actually finished? Check on the WN?
> What does the batch system claim is the state of that job?
>
>> CE whether function or not. I did CREAM direct job submission to the
>> CE, it worked perfectly.
>>
>> So now I guess the problem is at WMS...
>
> Not necessarily.
>
> Can you run glite-wms-job-status and glite-wms-logging-info again
> to see what the WMS now thinks is the state of the job?
>
--
Muhammad Farhan Sjaugi, S.Kom. M.Sc
Technical Coordinator
Academic Grid Malaysia
c/o UNITEN
email: [log in to unmask]
Perdana University Centre for Bioinformatics
email: [log in to unmask]
|