Hi,
After recent PBS restart (couple of days ago) I noticed that infosystem
plugin responsible for collecting information from PBS went unstable.
Instead of taking a proper number of waiting jobs from PBS, it initially
reports static 444444 from /opt/glite/etc/gip/ldif/static-file-CE.ldif:
[root@ce ~]# /opt/glite/libexec/glite-info-wrapper | grep
GlueCEStateWaitingJobs | uniq
GlueCEStateWaitingJobs: 444444
[root@ce ~]#
Failed execution is accompanied by two lines in /var/log/messages:
Jul 14 11:57:55 ce lcg-info-dynamic-scheduler: VO max jobs backend
command returned nonzero exit status
Jul 14 11:57:55 ce lcg-info-dynamic-scheduler: Exiting without output,
GIP will use static values
But when I run the command couple of times, or execute a qmgr command
manually, it starts to behave properly:
[root@ce ~]# /opt/glite/libexec/glite-info-wrapper | grep
GlueCEStateWaitingJobs | uniq
GlueCEStateWaitingJobs: 0
[root@ce ~]#
Then it stays ok for some time. "Some time" means couple of minutes up
to one hour, but I don't know what else has influence on it.
The problem seems to be connected with dying qmgr issue I reported on
this list couple of days ago, because timing patterns of both of them
are quite similar.
Do you have any clues what may be causing that? As I looked on gstat,
several other sites also show similar symptoms.
Best regards,
Adam
|