Hi there,
while trying to debug the following message from the UI's utilities:
Threshold for ICE Input JobDir jobs: 1500 => Detected value for ICE
Input JobDir jobs /var/glite/ice/jobdir : 1501
I have discovered that indeed old jobs are piling up in that directory.
For example, the very first job's logging info is attached. As you can
see, the job has finished almost one month ago. Why wasn't it removed
from the jobdir? What steps can one do to fix the service?
WKR,
Jan Kundrát
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job : https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ
---
Event: RegJob
- Arrived = Tue Sep 29 11:23:05 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Ns = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Nsubjobs = 0
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Timestamp = Tue Sep 29 11:23:05 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
- Jdl = SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/JDLToStart
---
Event: RegJob
- Arrived = Tue Sep 29 11:23:06 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Ns = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Nsubjobs = 0
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Timestamp = Tue Sep 29 11:23:05 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
- Jdl = SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/JDLToStart
---
Event: Accepted
- Arrived = Tue Sep 29 11:23:06 2009 CEST
- From = NetworkServer
- From host = lxb7962.cern.ch
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Timestamp = Tue Sep 29 11:23:06 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
---
Event: EnQueued
- Arrived = Tue Sep 29 11:23:06 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Queue = /var/glite/workload_manager/jobdir
- Result = START
- Seqcode = UI=000000:NS=0000000003:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Timestamp = Tue Sep 29 11:23:06 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
- Job = /var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/JDLToStart
---
Event: EnQueued
- Arrived = Tue Sep 29 11:23:06 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Queue = /var/glite/workload_manager/jobdir
- Result = OK
- Seqcode = UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Timestamp = Tue Sep 29 11:23:06 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
- Job =
[
requirements = ( other.GlueCEStateStatus == "Production" ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
RetryCount = 3;
edg_jobid = "https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ";
Arguments = "1254216180";
OutputSandboxPath = "/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/output";
MyProxyServer = "myproxy-fts.cern.ch";
AllowZippedISB = true;
JobType = "normal";
SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
Executable = "/bin/echo";
OutputSandboxDestURI = { "gsiftp://wms2.egee.cesnet.cz:2811/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/output/test.out" };
CertificateSubject = "/DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659";
X509UserProxy = "/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/user.proxy";
StdOutput = "test.out";
VOMS_FQAN = "/ops/Role=NULL/Capability=NULL";
OutputSandbox = { "test.out" };
LB_sequence_code = "UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
InputSandboxPath = "/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/input";
VirtualOrganisation = "ops";
rank = -other.GlueCEStateEstimatedResponseTime;
Type = "job";
ShallowRetryCount = 10;
WMPInputSandboxBaseURI = "gsiftp://wms2.egee.cesnet.cz:2811/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ";
DefaultRank = -other.GlueCEStateEstimatedResponseTime
]
---
Event: DeQueued
- Arrived = Tue Sep 29 11:23:07 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Queue = /var/glite/workload_manager/jobdir
- Seqcode = UI=000000:NS=0000000004:WM=000001:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8431
- Timestamp = Tue Sep 29 11:23:07 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Pending
- Arrived = Tue Sep 29 11:23:07 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = BrokerHelper: no compatible resources
- Seqcode = UI=000000:NS=0000000004:WM=000002:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8431
- Timestamp = Tue Sep 29 11:23:07 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Match
- Arrived = Tue Sep 29 11:33:08 2009 CEST
- Dest id = lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-dteam
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000004:WM=000003:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8431
- Timestamp = Tue Sep 29 11:33:08 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: EnQueued
- Arrived = Tue Sep 29 11:33:08 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Queue = /var/glite/jobcontrol/jobdir
- Result = START
- Seqcode = UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8431
- Timestamp = Tue Sep 29 11:33:08 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: EnQueued
- Arrived = Tue Sep 29 11:33:08 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Queue = /var/glite/jobcontrol/jobdir
- Result = OK
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8431
- Timestamp = Tue Sep 29 11:33:08 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
- Job =
[
Arguments =
[
JobAd =
[
stream_error = false;
edg_jobid = "https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ";
GlobusScheduler = "lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs";
ce_id = "lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-dteam";
Transfer_Executable = true;
Output = "/var/glite/jobcontrol/condorio/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/StandardOutput";
Copy_to_Spool = false;
Executable = "/var/glite/jobcontrol/submit/KT/JobWrapper.https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ.sh";
X509UserProxy = "/var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/user.proxy";
Error_ = "/var/glite/jobcontrol/condorio/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/StandardError";
LB_sequence_code = "UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
Notification = "never";
stream_output = false;
GlobusRSL = "(queue=dteam)(jobtype=single)(environment=(EDG_WL_JOBID 'https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ'))";
Type = "job";
Universe = "grid";
UserSubjectName = "/DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659";
Log = "/var/glite/logmonitor/CondorG.log/CondorG.log";
grid_type = "globus"
]
];
Command = "Submit";
Source = 2;
Protocol = "1.0.0"
]
---
Event: DeQueued
- Arrived = Tue Sep 29 11:33:10 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Local jobid = unavailable
- Priority = asynchronous
- Queue = /var/glite/jobcontrol/jobdir
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000001:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = JobController
- Src instance = unique
- Timestamp = Tue Sep 29 11:33:10 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Transfer
- Arrived = Tue Sep 29 11:33:10 2009 CEST
- Dest host = localhost
- Dest instance = /var/glite/logmonitor/CondorG.log/CondorG.1253934556.log
- Dest jobid = unavailable
- Destination = LogMonitor
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = unavailable
- Result = START
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000002:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = JobController
- Src instance = unique
- Timestamp = Tue Sep 29 11:33:10 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
- Job = unavailable
---
Event: Transfer
- Arrived = Tue Sep 29 11:33:10 2009 CEST
- Dest host = localhost
- Dest instance = /var/glite/logmonitor/CondorG.log/CondorG.1253934556.log
- Dest jobid = 347518
- Destination = LogMonitor
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = unavailable
- Result = OK
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = JobController
- Src instance = unique
- Timestamp = Tue Sep 29 11:33:10 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
- Job = stream_error = False
+edg_jobid = "https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ"
Arguments = 'UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000'
GlobusScheduler = lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs
Transfer_Executable = True
+ce_id = "lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-dteam"
Output = /var/glite/jobcontrol/condorio/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/StandardOutput
Submit_Event_Notes = (https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ) (UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000) (0)
Copy_to_Spool = False
Executable = /var/glite/jobcontrol/submit/KT/JobWrapper.https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ.sh
X509UserProxy = /var/glite/SandboxDir/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/user.proxy
error = /var/glite/jobcontrol/condorio/KT/https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ/StandardError
+LB_sequence_code = "UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000"
Notification = never
stream_output = False
GlobusRSL = (queue=dteam)(jobtype=single)(environment=(EDG_WL_JOBID 'https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ'))
+Type = "job"
Universe = grid
+UserSubjectName = "/DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659"
Log = /var/glite/logmonitor/CondorG.log/CondorG.1253934556.log
grid_type = globus
+CondorSubmitFile = "/var/glite/jobcontrol/submit/KT/Condor.https_3a_2f_2flb2.egee.cesnet.cz_3a9000_2fKTPF3-9ahbNaPGYjlKtwsQ.submit"
Queue 1
---
Event: Accepted
- Arrived = Tue Sep 29 11:33:11 2009 CEST
- From = JobController
- From host = localhost
- From instance = unavailable
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Local jobid = 347518
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000001:LRMS=000000:APP=000000:LBS=000000
- Source = LogMonitor
- Src instance = unique
- Timestamp = Tue Sep 29 11:33:11 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Transfer
- Arrived = Tue Sep 29 11:33:39 2009 CEST
- Dest host = lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs
- Dest instance = /var/glite/logmonitor/CondorG.log/CondorG.1253934556.log
- Dest jobid = unavailable
- Destination = LRMS
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = Job successfully submitted to Globus
- Result = OK
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000003:LRMS=000000:APP=000000:LBS=000000
- Source = LogMonitor
- Src instance = unique
- Timestamp = Tue Sep 29 11:33:38 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
- Job = (queue=dteam)(jobtype=single)(environment=(EDG_WL_JOBID 'https://lb2.egee.cesnet.cz:9000/KTPF3-9ahbNaPGYjlKtwsQ'))
---
Event: Cancel
- Arrived = Tue Sep 29 11:43:01 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = Cancelled by user
- Seqcode = UI=000000:NS=0000000005:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = https://wms2.egee.cesnet.cz:7443/glite_wms_wmproxy_server
- Status code = REQ
- Timestamp = Tue Sep 29 11:43:01 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659
---
Event: Cancel
- Arrived = Tue Sep 29 11:43:02 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000005:WM=000001:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8971
- Status code = REQ
- Timestamp = Tue Sep 29 11:43:02 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Cancel
- Arrived = Tue Sep 29 11:43:02 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Seqcode = UI=000000:NS=0000000005:WM=000002:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = WorkloadManager
- Src instance = 8971
- Status code = DONE
- Timestamp = Tue Sep 29 11:43:02 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Cancel
- Arrived = Tue Sep 29 11:43:04 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = Cancel requested by WorkloadManager
- Seqcode = UI=000000:NS=0000000005:WM=000003:BH=0000000000:JSS=000001:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = JobController
- Src instance = unique
- Status code = REQ
- Timestamp = Tue Sep 29 11:43:04 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Cancel
- Arrived = Tue Sep 29 11:43:16 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = Aborted by user.
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000005:LRMS=000000:APP=000000:LBS=000000
- Source = LogMonitor
- Src instance = unique
- Status code = DONE
- Timestamp = Tue Sep 29 11:43:16 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Done
- Arrived = Tue Sep 29 11:43:16 2009 CEST
- Exit code = 0
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = Aborted by user
- Seqcode = UI=000000:NS=0000000004:WM=000005:BH=0000000000:JSS=000003:LM=000006:LRMS=000000:APP=000000:LBS=000000
- Source = LogMonitor
- Src instance = unique
- Status code = CANCELLED
- Timestamp = Tue Sep 29 11:43:16 2009 CEST
- User = /DC=org/DC=doegrids/OU=People/CN=Steve Traylen 946659/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
---
Event: Clear
- Arrived = Tue Sep 29 11:43:16 2009 CEST
- Host = wms2.egee.cesnet.cz
- Level = SYSTEM
- Priority = asynchronous
- Reason = USER
- Seqcode = UI=000009:NS=0000096670:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- Source = NetworkServer
- Src instance = 7539
- Timestamp = Tue Sep 29 11:43:16 2009 CEST
- User = /DC=cz/DC=cesnet-ca/O=CESNET/CN=wms2.egee.cesnet.cz/CN=1237829650
**********************************************************************
|