Hi
It used to work fine all the time . The TOP bdii daemon was stuck and probably because of that the glite-wms-job-submit failed on no resources . It does not identify the CE for a reason not clear . Then the TOP bdii was restarted but I still have the wms job submit failing !
Note that when I submit directly from UI server to the CE using glite-ce-job-submit then the job works fine
To clarify the failure scenario:
-------------------------------------
when I issue the command : glite-wms-job-list-match -a ex6.jdl
I get the following response :
Connecting to the service https://wms-ce.haifa.il.ibm.com:7443/glite_wms_wmproxy_server
==================== glite-wms-job-list-match failure ====================
No Computing Element matching your job requirements has been found!
==========================================================================
when I run the command :
glite-wms-job-submit -a ex6.jdl
I get the following status :
======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms-ce.haifa.il.ibm.com:9000/9EGViPuxzJEs4SzUk1bZ3A
Current Status: Waiting
Status Reason: BrokerHelper: no compatible resources
Submitted: Sun Oct 20 01:22:46 2013 IDT
==========================================================================
looking at the WMS log files I see the following :
at : /var/log/wms/workload_manager_events.log
20 Oct, 01:25:27 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:105): MM for job: https://wms-ce.haifa.il.ibm.com:9000/NlDrpi_x0gn3v4U98mFKYg (0/331 [0] )
20 Oct, 01:25:27 -I: [Info] postpone(submit_request.cpp:268): postponing https://wms-ce.haifa.il.ibm.com:9000/NlDrpi_x0gn3v4U98mFKYg (BrokerHelper: no compatible resources)
~
at : /var/log/wms/httpd-wmproxy-errors.log
[Sun Oct 20 01:22:46 2013] [error] Certificate Verification: Error (26): unsupported certificate purpose
Threshold for Load Average(1 min): 22 => Detected value for Load Average(1 min): 0.16
Threshold for Load Average(5 min): 20 => Detected value for Load Average(5 min): 0.22
Threshold for Load Average(15 min): 18 => Detected value for Load Average(15 min): 0.18
Threshold for Memory Usage: 99 => Detected value for Memory Usage: 28.50%
Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
Threshold for Free FD: 1000 => Detected value for Free FD: 202974
Threshold for FTP Connection: 300 => Detected value for FTP Connection: 1
Threshold for Disk Usage: 95% => Detected value for Partition / : 11%
Threshold for WMS Input FileList size: 204800 => Detected value for WMS Input FileList size /var/workload_manager/jobdir : 20
Threshold for WMS Input FileList jobs: 500 => Detected value for WMS Input FileList jobs /var/workload_manager/jobdir : 0
Threshold for JC Input FileList size: 204800 => Detected value for JC Input FileList size /var/jobcontrol/jobdir/ : 16
Threshold for JC Input FileList jobs: 500 => Detected value for JC Input FileList jobs /var/jobcontrol/jobdir/ : 0
Threshold for ICE Input FileList size: 204800 => Detected value for ICE Input FileList size /var/ice/jobdir : 16
Threshold for ICE Input FileList jobs: 500 => Detected value for ICE Input FileList jobs /var/ice/jobdir : 0
In my opinion I have no Certificate issue . The WMS and CE certificates are valid
In the other wms logs I see nothing intresting
Any idea ?
|