On Thu, 19 Jul 2007, Stathakopoulos George wrote:
> We run /opt/lcg/sbin/cleanup-grid-accounts.sh only at one node of the
> cluster (CE's /opt/lcg/etc/cleanup-grid-accounts.conf has all accounts). As
> far as we can check, GPFS is working fine.
OK.
> We see in gram_job_mgr_<pid>.log for every globus-job-manager process these
> entries:
>
> 7/16 08:32:04 JMI: poll: seeking:
> https://ce01.kallisto.hellasgrid.gr:20004/26569/1184563918/
> 7/16 08:32:04 JMI: poll_fast: ******** Failed to find
> https://ce01.kallisto.hellasgrid.gr/26569/1184563918/
> 7/16 08:32:04 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 7/16 08:32:04 JMI: cmd = poll
>
> every 10 seconds. Globus-job-manager processes are running for more than an
> hour each.
A lot of those messages appear even if everything is working fine.
You reported the problem on Fri. the 13th (sic): what changed in your cluster
that day or one day before?
Check for APT auto-update logs and similar things. Maybe something on the
GPFS server? If it has some issue, it could affect the clients a lot.
|