https://gus.fzk.de/ws/ticket_info.php?ticket=68730 On 03/17/2011 04:32 PM, Gonçalo Borges wrote: > Hi Dimitris... > > Thanks for the feedback, but at the end, I was able to solve the > problem, which seems to be different from the one you have. > > I realized that there was a long queue of requests to be processed by > the workload-management system > (/var/glite/workload_manager/jobdir/old/). The WM was trying to > reprocess some old entries in the queue and failing. Immediately after > the failure, I saw logs like: > > 17 Mar, 14:53:02 -W: [Warning] get_catalog_url(dli_utils.cpp:89): No > endpoints found > 17 Mar, 14:53:02 -W: [Warning] > resolve_filemapping_info(dli_utils.cpp:364): cannot get > DataCatalogType or endpoint > 17 Mar, 14:53:02 -I: [Info] > checkRequirement(matchmakerISMImpl.cpp:222): MM for job: > https://wms01.ncg.ingrid.pt:9000/Es5PVTeUke9kHx4hG6qvDg (0/0 [0] ) > 17 Mar, 14:53:02 -I: [Info] postpone(submit_request.cpp:212): > postponing *https://wms01.ncg.ingrid.pt:9000/Es5PVTeUke9kHx4hG6qvDg > *(BrokerHelper: no compatible resources) > > and running with LogLevel 6 > > 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying > to get data-location-interface information through SD... > 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying > to get data-location-interface information through SD... > 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying > to get data-location-interface information through SD... > 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying > to get data-location-interface information through SD... > > I suspected from some jobs with badly defined JDL, and indeed, looking > to the JDL of one of the job ids referred in the logs, I saw things like: > > DataRequirements = { > [ > DataCatalogType = "DLI"; > InputData = { "guid:3172069e-d20b-483e-afba-f7acc689ac85" } > ] }; > > It seems the user was submitting jobs, requesting a given file, but > not referring the LFC where the file was registered. Because of that, > the WMS dind't know how to process that request, and failed. > > Basically I had to delete the reference to those kind of jobs in > /var/glite/workload_manager/jobdir/old/, and after that, the daemons > are working perfectly. > > I have contacted the user, ask him to correct the JDL, but the service > should also be protected against this kind of missusage. > > Cheers > Goncalo > > On 03/17/2011 02:03 PM, Dimitris Zilaskos wrote: >> Hi, >> >> Not a solution, but I had similar experience recorded at >> https://gus.fzk.de/ws/ticket_info.php?ticket=66943. >> >> I have just installed the latest glite update and hammering the >> service again to see if I can reproduce the problem... >> >> Cheers, >> >> Στις 17/3/2011 3:50 μμ, ο/η Gonçalo Borges έγραψε: >>> Hi All... >>> >>> My WMS is not able to put glite-wms-wm running. I'm running this >>> service >>> with loglevel 6, and the final message produced is: >>> >>> (...) >>> 17 Mar, 13:33:16 -D: [Debug] populate_ism(ism-ii-purchaser.cpp:129): >>> w-dpm01.grid.sinica.edu.tw added to ISM >>> 17 Mar, 13:33:16 -D: [Debug] populate_ism(ism-ii-purchaser.cpp:129): >>> wipp-se.weizmann.ac.il added to ISM >>> 17 Mar, 13:33:16 -D: [Debug] populate_ism(ism-ii-purchaser.cpp:129): >>> wormhole.westgrid.ca added to ISM >>> 17 Mar, 13:33:16 -D: [Debug] switch_active_side(ism.cpp:36): switched >>> active side to ISM 0 >>> 17 Mar, 13:33:16 -I: [Info] main(main.cpp:421): spawning 5 worker >>> threads... >>> 17 Mar, 13:33:16 -D: [Debug] operator()(submit_request.cpp:224): >>> considering (re)submit of >>> https://wms01.ncg.ingrid.pt:9000/EGWpbJHJy72KIukdmHsyRA >>> 17 Mar, 13:33:16 -D: [Debug] operator()(submit_request.cpp:224): >>> considering (re)submit of >>> https://wms01.ncg.ingrid.pt:9000/-S9-S0z62nRHkcfzU4gggw >>> 17 Mar, 13:33:16 -D: [Debug] operator()(submit_request.cpp:224): >>> considering (re)submit of >>> https://wms01.ncg.ingrid.pt:9000/i_yTAUVrzoOXm3BBQYSJ1g >>> 17 Mar, 13:33:16 -D: [Debug] operator()(submit_request.cpp:224): >>> considering (re)submit of >>> https://wms01.ncg.ingrid.pt:9000/-dXZ4gRb4mVXAbCM4_-GYQ >>> 17 Mar, 13:33:16 -D: [Debug] operator()(submit_request.cpp:224): >>> considering (re)submit of >>> https://wms01.ncg.ingrid.pt:9000/-hV1oym9F3g-6ZoZSlU7ig >>> 17 Mar, 13:33:16 -I: [Info] main(main.cpp:429): scheduling >>> dispatcher... >>> 17 Mar, 13:33:16 -I: [Info] main(main.cpp:438): scheduling ISM >>> purchaser(s)... >>> 17 Mar, 13:33:16 -I: [Info] main(main.cpp:473): scheduling ISM >>> updater... >>> 17 Mar, 13:33:16 -D: [Debug] operator()(ism.cpp:142): ISM updater start >>> 17 Mar, 13:33:16 -D: [Debug] operator()(ism.cpp:145): ISM updater end >>> 17 Mar, 13:33:16 -I: [Info] main(main.cpp:498): WM startup completed... >>> 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying >>> to get data-location-interface information through SD... >>> 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying >>> to get data-location-interface information through SD... >>> 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying >>> to get data-location-interface information through SD... >>> 17 Mar, 13:33:17 -D: [Debug] get_catalog_url(dli_utils.cpp:62): trying >>> to get data-location-interface information through SD... >>> >>> I've looked to >>> http://goc.grid.sinica.edu.tw/gocwiki/Jobs_sent_to_my_RB_stay_in_Waiting_state_forever, >>> >>> restarted the daemon several times, without any success. >>> >>> Can someone shed some light on the topic? >>> >>> Cheers >>> Goncalo >>> >>> >>> >>> >>> >> >> >