Hallo Christoph,
> > http://goc.grid.sinica.edu.tw/gocwiki/Jobs_sent_to_some_CE_stay_in_Ready_state_forever
> >
> >
> I did it right now. Since the problem is independent of the CE where the
> job is submitted to it seems to be a WMS issue. But the page gives no
> hints what it actually could be. Firewall issues are excluded as well
> since the effect is same for the local CEs that are not firewalled. As
> said the WMS worked a day or after the upgrade is stuck now.
Jobs can also stay in Ready when Condor-G has some problem, indeed.
If a restart does not help, the logfiles should provide some clue.
> Nothing obvious to me, but I am not really used to read that stuff.
> There quite some messages "Cannot cancel job from queue" in
> /var/glite/logmonitor/ConderG.log/*. Might be there are too many cancel
> request stuck and they are now blocking the whole system.
I meant these logs in /var/local/condor/log:
GridmanagerLog.glite
SchedLog
Any suspicious stuff in there? What is the WMS host name?
> How do change the logging of Conder stuff? I discovered some settings at
> the end of part 2 of /opt/condor-7.4.1/etc/condor_config that seem to
> address Condor logging. But I am uncertain how to change the parameters.
Add these to the end:
GRIDMANAGER_DEBUG = D_FULLDEBUG
MAX_GRIDMANAGER_LOG = 1000000000
Then:
/opt/glite/etc/init.d/glite-wms-jc restart CondorG
|