On Thu, 1 Feb 2007, pierre girard wrote:
> Hi Maarten,
>
> Thanks for your explanations. After a quiet period, the problem seems to
> be there again. I'm then able to provide you a "ps" result :
> > [root@cclcgceli02 ~]$ ps -elf | grep globus-job-manager | grep cms050
> > | wc -l
> > 295
> > [root@cclcgceli02 ~]$ ps -elf | grep globus-job-manager | grep cms050
> > | more
> > 0 S cms050 32096 1 0 75 0 - 1355 schedu 11:19 ?
> > 00:00:01 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type fork -rdn jobmanager-fork -machine-type unknown -publish-jobs
> > 0 S cms050 8145 1 0 76 0 - 1354 schedu 11:37 ?
> > 00:00:01 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type fork -rdn jobmanager-fork -machine-type unknown -publish-jobs
> > 0 S cms050 9670 1 0 75 0 - 1291 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> > 0 S cms050 9677 1 0 75 0 - 1292 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> > 0 S cms050 9755 1 0 75 0 - 1252 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> > 0 S cms050 9761 1 0 75 0 - 1252 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> > 0 S cms050 9907 1 0 75 0 - 1251 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> > 0 S cms050 9912 1 0 75 0 - 1252 schedu 11:38 ?
> > 00:00:00 globus-job-manager -conf /opt/globus/etc/globus-job-manager.c
> > onf -type bqs -rdn jobmanager-bqs -machine-type unknown -publish-jobs
> There are currently 3 different certificates mapped on cms050 (which is
> the account we used to map the CMS production role).
>
> On the cluster, for cms050, there are :
> - 527 running jobs
> - 1208 queued jobs
Please capture the output of an "lsof" command. From "netstat -a" I get
the impression those processes all came from RB laranja.iihe.ac.be.
No further clues for now. Was anything changed on the CE recently?
|