> As a workaround, can you try adding that host DN as a trusted retriever? > > trusted_retrievers "/C=RS/O=AEGIS/....." Hi Maarten, After adding this to the conf of MyProxy server, we don't see any errors in /var/log/messages on MyProxy, nor on WMS (related to glite-proxy-renewd). However, jobs are still dying due to aborted proxy. This is one example of such a job: [alex@ce sh5-see-ce]$ glite-wms-job-status https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw Current Status: Aborted Status Reason: Job proxy is expired. Destination: ce-atlas.phy.bg.ac.yu:2119/jobmanager-pbs-see Submitted: Sat Jun 21 14:05:16 2008 CEST ************************************************************* From its status, I see that: - stateEnterTimes = Submitted : Sat Jun 21 14:05:16 2008 CEST Waiting : Sat Jun 21 14:05:29 2008 CEST Ready : Sat Jun 21 14:06:18 2008 CEST Scheduled : Sat Jun 21 14:51:08 2008 CEST Running : --- Done : Sun Jun 22 16:09:06 2008 CEST Cleared : --- Aborted : Sun Jun 22 18:04:09 2008 CEST Cancelled : --- Unknown : --- Definitely, around Sun Jun 22 16:09:06 2008 CEST and Sun Jun 22 18:04:09 2008 CEST there is nothing in /var/log/messages on MyProxy and on WMS. This is what I found in /var/log/glite that is related to this job: [root@wms glite]# grep LPpwobsRBu3EYhzZQ_LKuw *.log jobcontoller_events.log:21 Jun, 14:50:43 -V- JobControllerReal::submit(...): Submitting job "https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw" logmonitor_events.log:21 Jun, 14:50:47 -I- EventSubmit::finalProcess(...): Job id = https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:21 Jun, 14:50:47 -I- SubmitReader::internalRead(): Reading condor submit file of job https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:21 Jun, 14:51:08 -I- EventGlobusSubmit::process_event(): Job id = https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:21 Jun, 14:51:08 -I- SubmitReader::internalRead(): Reading condor submit file of job https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:22 Jun, 16:09:06 -I- EventJobHeld::process_event(): Job id = https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:22 Jun, 18:04:08 -I- EventAborted::process_event(): Job id = https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw logmonitor_events.log:22 Jun, 18:04:09 -I- JobResubmitter::resubmit(...): Job id = https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw workload_manager_events.log:21 Jun, 14:05:33 -I: [Info] operator()(/home/glbuild/GLITE_3_1_0_continous/org.glite.wms.manager/src/server/dispatcher.cpp:470): new jobsubmit for https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw workload_manager_events.log:21 Jun, 14:06:16 -I: [Info] checkRequirement(/home/glbuild/GLITE_3_1_0_continous/org.glite.wms.matchmaking/src/matchmakerISMImpl.cpp:79): MM for job: https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw (1/6194 [0, 2.48] ) workload_manager_events.log:21 Jun, 14:06:19 -I: [Info] do_transitions_for_submit(/home/glbuild/GLITE_3_1_0_continous/org.glite.wms.manager/src/server/dispatcher.cpp:283): https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw delivered So, for some reason the job was held at 16:09:06, and aborted at 18:04:08. From jobcontoller_events.log I can find that the condor id of that job is 652005: 21 Jun, 14:50:42 -I- ControllerLoop::run(): Got new submit request... 21 Jun, 14:50:42 -I- SubmitAd::createFromAd(...): Creating job directory path. 21 Jun, 14:50:43 -M- JobControllerReal::submit(...): Classad file created... 21 Jun, 14:50:43 -V- JobControllerReal::submit(...): Submitting job "https://wms.phy.bg.ac.yu:9000/LPpwobsRBu3EYhzZQ_LKuw" 21 Jun, 14:50:43 -M- JobControllerReal::submit(...): Submit file created... 21 Jun, 14:50:44 -V- JobControllerReal::submit(...): Job submitted to Condor cluster: 652005 But the job is not in condor queue anymore. Contrary to the logs, the files associated with the job are not removed: [root@wms glite-renewd]# ll /var/glite/SandboxDir/LP/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2fLPpwobsRBu3EYhzZQ_5fLKuw/ total 24 drwxrwx--- 2 aegis013 glite 4096 Jun 21 14:06 input -rw-r--r-- 1 glite glite 799 Jun 21 14:05 JDLOriginal -rw-r--r-- 1 glite glite 2055 Jun 21 14:05 JDLStarted drwxrwx--- 2 aegis013 glite 4096 Jun 21 14:05 output drwxrwx--- 2 aegis013 glite 4096 Jun 21 14:05 peek -rw-r--r-- 1 glite glite 0 Jun 21 14:06 token.txt lrwxrwxrwx 1 glite glite 67 Jun 21 14:05 user.proxy -> /var/glite/spool/glite-renewd/447177e6e930ecc86b52a6a9100ce494.2519 However, /var/glite/spool/glite-renewd/447177e6e930ecc86b52a6a9100ce494.2519 does not exist anymore. Any help in finding what went wrong is appreciated. This should be a problem with glite-proxy-renewd, but I cannot find any traces that suggest what is wrong... Thanks, Antun