Hi Maarten,
The problem is still there. No suspicious records in /var/log/messages on
MyProxy nor on WMS+LB, but jobs still die (example of status and logging info
for one such job is given below). Any ideas what we might try next?
Thanks, Antun
[antun@ce sh5-aegis-ce]$ myproxy-info -d -s myproxy.phy.bg.ac.yu
username: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Antun Balaz
owner: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Antun Balaz
timeleft: 433:34:07 (18.1 days)
[antun@ce sh5-aegis-ce]$ glite-wms-job-status -v 3
https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ
Current Status: Aborted
Status Reason: Job proxy is expired.
Destination: ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jun 26 21:46:55 2008 CEST
---
- cancelling = 0
- ce_node = gt2 ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs
- children_num = 0
- condorId = 672247
- cpuTime = 0
- destination = ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
- done_code = 1
- expectUpdate = 0
- jobtype = 0
- lastUpdateTime = Fri Jun 27 09:48:13 2008 CEST
- location = none
- network_server = https://147.91.84.25:7443/glite_wms_wmproxy_server
- owner = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
- Payload_Running = 0
- resubmitted = 0
- subjob_failed = 0
---
- children_hist = 0
Undefined=0
Submitted=0
Waiting=0
Ready=0
Scheduled=0
Running=0
Done=0
Cleared=0
Aborted=0
Cancelled=0
Unknown=0
Purged=0
- condor_jdl =
stream_error = False
+edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ"
Arguments =
'UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000'
GlobusScheduler = ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs
+ce_id = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis"
Transfer_Executable = True
Output =
/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardOutput
Submit_Event_Notes =
(https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ)
(UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000)
(0)
Copy_to_Spool = False
Executable =
/var/glite/jobcontrol/submit/aK/JobWrapper.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.sh
X509UserProxy =
/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy
error =
/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardError
+LB_sequence_code =
"UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000"
Notification = never
stream_output = False
GlobusRSL = (queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))
+Type = "job"
Universe = grid
+UserSubjectName = "/C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz"
Log = /var/glite/logmonitor/CondorG.log/CondorG.1214496803.log
grid_type = globus
+CondorSubmitFile =
"/var/glite/jobcontrol/submit/aK/Condor.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.submit"
Queue 1
- jdl =
[
requirements = ( (
RegExp("ce-atlas.phy.bg.ac.yu*",other.GlueCEUniqueID) ) && (
other.GlueCEStateStatus == "Production" ) ) && ( other.GlueCEStateStatus ==
"Production" );
RetryCount = 3;
edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ";
Arguments = "";
OutputSandboxPath =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output";
MyProxyServer = "myproxy.phy.bg.ac.yu";
AllowZippedISB = true;
JobType = "normal";
InputSandboxDestFileName = { "rasa-1mregr-1.sh" };
Executable = "rasa-1mregr-1.sh";
OutputSandboxDestURI = {
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output/stdout-1.txt","gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output/stderr-1.txt"
};
CertificateSubject = "/C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz";
X509UserProxy =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy";
StdOutput = "stdout-1.txt";
VOMS_FQAN = "/aegis";
OutputSandbox = { "stdout-1.txt","stderr-1.txt" };
LB_sequence_code =
"UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
InputSandboxPath =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/input";
VirtualOrganisation = "aegis";
rank = -other.GlueCEStateEstimatedResponseTime;
Type = "job";
ShallowRetryCount = 10;
StdError = "stderr-1.txt";
WMPInputSandboxBaseURI =
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ";
DefaultRank = -other.GlueCEStateEstimatedResponseTime;
ZippedISB = { "ISBfiles_sYSTZqSNgs653eT3AQ0NeQ_0.tar.gz" };
InputSandbox = {
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/input/rasa-1mregr-1.sh"
}
]
- matched_jdl =
[
Arguments =
[
JobAd =
[
stream_error = false;
edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ";
GlobusScheduler = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs";
ce_id = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis";
Transfer_Executable = true;
Output =
"/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardOutput";
Copy_to_Spool = false;
Executable =
"/var/glite/jobcontrol/submit/aK/JobWrapper.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.sh";
X509UserProxy =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy";
Error_ =
"/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardError";
LB_sequence_code =
"UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
Notification = "never";
stream_output = false;
GlobusRSL =
"(queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))";
Type = "job";
Universe = "grid";
UserSubjectName = "/C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz";
Log = "/var/glite/logmonitor/CondorG.log/CondorG.log";
grid_type = "globus"
]
];
Command = "Submit";
Source = 2;
Protocol = "1.0.0"
]
- rsl =
(queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))
- stateEnterTimes =
Submitted : Thu Jun 26 21:46:55 2008 CEST
Waiting : Thu Jun 26 21:46:59 2008 CEST
Ready : Thu Jun 26 21:47:22 2008 CEST
Scheduled : Thu Jun 26 21:47:50 2008 CEST
Running : Thu Jun 26 21:48:29 2008 CEST
Done : Fri Jun 27 09:40:33 2008 CEST
Cleared : ---
Aborted : Fri Jun 27 09:48:13 2008 CEST
Cancelled : ---
Unknown : ---
*************************************************************
[antun@ce sh5-aegis-ce]$ glite-wms-job-logging-info -v 3
https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job : https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ
---
Event: RegJob
- arrived = Thu Jun 26 21:46:55 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- ns = https://147.91.84.25:7443/glite_wms_wmproxy_server
- nsubjobs = 0
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = NetworkServer
- src_instance = https://147.91.84.25:7443/glite_wms_wmproxy_server
- timestamp = Thu Jun 26 21:46:55 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
- jdl =
SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/JDLToStart
---
Event: RegJob
- arrived = Thu Jun 26 21:47:07 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- ns = https://147.91.84.25:7443/glite_wms_wmproxy_server
- nsubjobs = 0
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = NetworkServer
- src_instance = https://147.91.84.25:7443/glite_wms_wmproxy_server
- timestamp = Thu Jun 26 21:46:56 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=host/wms.phy.bg.ac.yu
- jdl =
SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/JDLToStart
---
Event: Accepted
- arrived = Thu Jun 26 21:47:10 2008 CEST
- from = NetworkServer
- from_host = ce.phy.bg.ac.yu
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = NetworkServer
- src_instance = https://147.91.84.25:7443/glite_wms_wmproxy_server
- timestamp = Thu Jun 26 21:46:59 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
---
Event: EnQueued
- arrived = Thu Jun 26 21:47:12 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- queue = /var/glite/workload_manager/input.fl
- result = START
- seqcode =
UI=000000:NS=0000000003:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = NetworkServer
- src_instance = https://147.91.84.25:7443/glite_wms_wmproxy_server
- timestamp = Thu Jun 26 21:47:01 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
- job =
/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/JDLToStart
---
Event: EnQueued
- arrived = Thu Jun 26 21:47:18 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- queue = /var/glite/workload_manager/input.fl
- result = OK
- seqcode =
UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = NetworkServer
- src_instance = https://147.91.84.25:7443/glite_wms_wmproxy_server
- timestamp = Thu Jun 26 21:47:03 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
- job =
[
requirements = ( (
RegExp("ce-atlas.phy.bg.ac.yu*",other.GlueCEUniqueID) ) && (
other.GlueCEStateStatus == "Production" ) ) && ( other.GlueCEStateStatus ==
"Production" );
RetryCount = 3;
edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ";
Arguments = "";
OutputSandboxPath =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output";
MyProxyServer = "myproxy.phy.bg.ac.yu";
AllowZippedISB = true;
JobType = "normal";
InputSandboxDestFileName = { "rasa-1mregr-1.sh" };
Executable = "rasa-1mregr-1.sh";
OutputSandboxDestURI = {
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output/stdout-1.txt","gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/output/stderr-1.txt"
};
CertificateSubject = "/C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz";
X509UserProxy =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy";
StdOutput = "stdout-1.txt";
VOMS_FQAN = "/aegis";
OutputSandbox = { "stdout-1.txt","stderr-1.txt" };
LB_sequence_code =
"UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
InputSandboxPath =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/input";
VirtualOrganisation = "aegis";
rank = -other.GlueCEStateEstimatedResponseTime;
Type = "job";
ShallowRetryCount = 10;
StdError = "stderr-1.txt";
WMPInputSandboxBaseURI =
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ";
DefaultRank = -other.GlueCEStateEstimatedResponseTime;
ZippedISB = { "ISBfiles_sYSTZqSNgs653eT3AQ0NeQ_0.tar.gz" };
InputSandbox = {
"gsiftp://wms.phy.bg.ac.yu:2811/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/input/rasa-1mregr-1.sh"
}
]
---
Event: DeQueued
- arrived = Thu Jun 26 21:47:13 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- queue = /var/glite/workload_manager/input.fl
- seqcode =
UI=000000:NS=0000000004:WM=000001:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = WorkloadManager
- src_instance = 3018
- timestamp = Thu Jun 26 21:47:03 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Match
- arrived = Thu Jun 26 21:47:37 2008 CEST
- dest_id = ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000004:WM=000002:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = WorkloadManager
- src_instance = 3018
- timestamp = Thu Jun 26 21:47:21 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: EnQueued
- arrived = Thu Jun 26 21:47:47 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- queue = /var/glite/jobcontrol/queue.fl
- result = START
- seqcode =
UI=000000:NS=0000000004:WM=000003:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = WorkloadManager
- src_instance = 3018
- timestamp = Thu Jun 26 21:47:21 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: EnQueued
- arrived = Thu Jun 26 21:47:53 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- queue = /var/glite/jobcontrol/queue.fl
- result = OK
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = WorkloadManager
- src_instance = 3018
- timestamp = Thu Jun 26 21:47:22 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
- job =
[
Arguments =
[
JobAd =
[
stream_error = false;
edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ";
GlobusScheduler = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs";
ce_id = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis";
Transfer_Executable = true;
Output =
"/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardOutput";
Copy_to_Spool = false;
Executable =
"/var/glite/jobcontrol/submit/aK/JobWrapper.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.sh";
X509UserProxy =
"/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy";
Error_ =
"/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardError";
LB_sequence_code =
"UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000";
Notification = "never";
stream_output = false;
GlobusRSL =
"(queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))";
Type = "job";
Universe = "grid";
UserSubjectName = "/C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz";
Log = "/var/glite/logmonitor/CondorG.log/CondorG.log";
grid_type = "globus"
]
];
Command = "Submit";
Source = 2;
Protocol = "1.0.0"
]
---
Event: DeQueued
- arrived = Thu Jun 26 21:47:55 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- local_jobid = unavailable
- priority = asynchronous
- queue = /var/glite/jobcontrol/queue.fl
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000001:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = JobController
- src_instance = unique
- timestamp = Thu Jun 26 21:47:22 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Transfer
- arrived = Thu Jun 26 21:47:59 2008 CEST
- dest_host = localhost
- dest_instance =
/var/glite/logmonitor/CondorG.log/CondorG.1214496803.log
- dest_jobid = unavailable
- destination = LogMonitor
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = unavailable
- result = START
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = JobController
- src_instance = unique
- timestamp = Thu Jun 26 21:47:22 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
- job = unavailable
---
Event: Transfer
- arrived = Thu Jun 26 21:48:06 2008 CEST
- dest_host = localhost
- dest_instance =
/var/glite/logmonitor/CondorG.log/CondorG.1214496803.log
- dest_jobid = 672247
- destination = LogMonitor
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = unavailable
- result = OK
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000
- source = JobController
- src_instance = unique
- timestamp = Thu Jun 26 21:47:23 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
- job = stream_error = False
+edg_jobid = "https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ"
Arguments =
'UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000'
GlobusScheduler = ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs
+ce_id = "ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-aegis"
Transfer_Executable = True
Output =
/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardOutput
Submit_Event_Notes = (https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ)
(UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000)
(0)
Copy_to_Spool = False
Executable =
/var/glite/jobcontrol/submit/aK/JobWrapper.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.sh
X509UserProxy =
/var/glite/SandboxDir/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/user.proxy
error =
/var/glite/jobcontrol/condorio/aK/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ/StandardError
+LB_sequence_code =
"UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000000:APP=000000:LBS=000000"
Notification = never
stream_output = False
GlobusRSL = (queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))
+Type = "job"
Universe = grid
+UserSubjectName = "/C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Antun Balaz"
Log = /var/glite/logmonitor/CondorG.log/CondorG.1214496803.log
grid_type = globus
+CondorSubmitFile =
"/var/glite/jobcontrol/submit/aK/Condor.https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2faKfpxU2O0b9fy4sM8jfejQ.submit"
Queue 1
---
Event: Running
- arrived = Thu Jun 26 21:48:29 2008 CEST
- host = wn24.phy.bg.ac.yu
- level = SYSTEM
- node = wn24.phy.bg.ac.yu
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000001:APP=000000:LBS=000000
- source = LRMS
- timestamp = Thu Jun 26 21:48:29 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
---
Event: ReallyRunning
- arrived = Thu Jun 26 21:48:35 2008 CEST
- host = wn24.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000003:APP=000000:LBS=000000
- source = LRMS
- timestamp = Thu Jun 26 21:48:32 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
---
Event: Done
- arrived = Fri Jun 27 09:40:52 2008 CEST
- exit_code = 0
- host = wn24.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = Job has been terminated by the batch system
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000000:LRMS=000005:APP=000000:LBS=000000
- source = LRMS
- status_code = FAILED
- timestamp = Fri Jun 27 09:40:51 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz
---
Event: Accepted
- arrived = Thu Jun 26 21:48:36 2008 CEST
- from = JobController
- from_host = localhost
- from_instance = unavailable
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- local_jobid = 672247
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000001:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- timestamp = Thu Jun 26 21:47:28 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Transfer
- arrived = Thu Jun 26 21:49:05 2008 CEST
- dest_host = ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs
- dest_instance =
/var/glite/logmonitor/CondorG.log/CondorG.1214496803.log
- dest_jobid = unavailable
- destination = LRMS
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = Job successfully submitted to Globus
- result = OK
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000003:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- timestamp = Thu Jun 26 21:47:50 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
- job = (queue=aegis)(jobtype=single)(environment=(EDG_WL_JOBID
'https://wms.phy.bg.ac.yu:9000/aKfpxU2O0b9fy4sM8jfejQ'))
---
Event: Running
- arrived = Thu Jun 26 21:53:19 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- node = gt2 ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs
- priority = asynchronous
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000005:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- timestamp = Thu Jun 26 21:50:15 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Done
- arrived = Fri Jun 27 09:40:36 2008 CEST
- exit_code = 1
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = Got a job held event, reason: Globus error 131:
the user proxy expired (job is still running)
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000007:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- status_code = FAILED
- timestamp = Fri Jun 27 09:40:33 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Done
- arrived = Fri Jun 27 09:48:13 2008 CEST
- exit_code = 1
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = Job got an error while in the CondorG queue.
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000009:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- status_code = FAILED
- timestamp = Fri Jun 27 09:48:13 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
---
Event: Abort
- arrived = Fri Jun 27 09:48:14 2008 CEST
- host = wms.phy.bg.ac.yu
- level = SYSTEM
- priority = asynchronous
- reason = Job proxy is expired.
- seqcode =
UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000003:LM=000010:LRMS=000000:APP=000000:LBS=000000
- source = LogMonitor
- src_instance = unique
- timestamp = Fri Jun 27 09:48:13 2008 CEST
- user = /C=RS/O=AEGIS/OU=Institute of Physics
Belgrade/CN=Antun Balaz/CN=proxy/CN=proxy
**********************************************************************
-----
Antun Balaz
Research Assistant
E-mail: [log in to unmask]
Web: http://scl.phy.bg.ac.yu/
Phone: +381 11 3713152
Fax: +381 11 3162190
Scientific Computing Laboratory
Institute of Physics Belgrade
Pregrevica 118, 11080 Belgrade, Serbia
-----
---------- Original Message -----------
From: Maarten Litmaath <[log in to unmask]>
To: [log in to unmask]
Sent: Wed, 25 Jun 2008 09:14:53 +0200
Subject: Re: [LCG-ROLLOUT] MyProxy issue
> Hi Antun,
>
> > > I have done a preliminary certification of that rpm: I installed a
> > > MyProxy server with the current rpm, uploaded a proxy, upgraded and
> > > restarted the server, and verified that both 3.0 and 3.1 WMS nodes
> > > still can get delegations of the previously stored proxy.
> > >
> > > The formal certification still may take a few days at least.
> > > You may want to try upgrading your server already, to see if it
> > > cures the proxy renewal bug:
> > >
> > > http://litmaath.home.cern.ch/litmaath/myproxy-new/
>
> Overnight I have run 40k myproxy-get-delegation requests against both
> the current and the new server each. For many minutes there were 20
> concurrent requests against each server. Not a single request
> failed, so the problem could not be reproduced this way. The
> resulting proxies were OK for RB and WMS jobs, and for lcg-cr
> against all SE flavors. So, it appears the new rpm is not making
> things worse, while correcting the Globus flavor used by MyProxy, so
> I intend to have myproxy.cern.ch upgraded ASAP.
------- End of Original Message -------
|