On Sat, 20 Aug 2005, Jiri Kosina wrote:
> On Sat, 20 Aug 2005, [log in to unmask] wrote:
>
> >> A little correction to this: I have replaced qmgr and pbsnodes binaries
> >> with my wrappers to catch the commandline and log the output that is
> >> passed to the lcg-info-* script by these PBS commands. They seemed perfect
> >> even at the times when incorrect values were published.
> > You are looking at cached information.
> > Let the wrappers also log the time taken by the real commands.
> > The algorithm of the generic information provider is as follows:
> > if the cached information is older than 20 seconds, the dynamic plug-in
> > is run and its output will be used if it comes within 5 seconds,
> > otherwise the cached values will be used, unless the file is more than
> > 10 minutes old, in which case the static defaults are used.
> > The dynamic plug-in may continue in the background to refresh the cached
> > information, for up to 10 minutes, after which it will be killed.
>
> Hello Maarten,
>
> thanks for your reply. I have meanwhile digged into the perl information
> provider scripts and found the information you provided. Then I have
> tracked the problem to the fact that time to time the file
> /opt/lcg/var/gip/tmp/lcg-info-dynamic-ce.ldif.<number> gets truncated to
> the zero length, and therefore the corrupted cached information is
> provided.
>
> I can easily fix this by running the lcg-info-wrapper as the edguinfo
> user, which regenerates the file with cached information and correct data
> are provided again.
>
> I have however not yet found out why the file gets truncated. Seems to me
> it gets truncated every minute or so:
>
> while true; do ll /opt/lcg/var/gip/tmp/; sleep 5; done
> total 8
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:19
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:19
> lcg-info-dynamic-software.ldif.7652
> total 8
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:19
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:19
> lcg-info-dynamic-software.ldif.7652
> total 8
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:19
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:19
> lcg-info-dynamic-software.ldif.7652
> total 4
> -rw-r--r-- 1 edginfo edginfo 0 Aug 20 00:20
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:20
> lcg-info-dynamic-software.ldif.7652
>
> We can see that it got truncated exactly on the border of the minute. Now
> I regenerate it
>
> ./lcg-info-wrapper >/dev/null
>
> And start the "watching" cycle again. It gets truncated on the next minute
> border again:
>
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:21
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:21
> lcg-info-dynamic-software.ldif.7652
> total 8
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:21
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:21
> lcg-info-dynamic-software.ldif.7652
> total 8
> -rw-r--r-- 1 edginfo edginfo 1754 Aug 20 00:21
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:21
> lcg-info-dynamic-software.ldif.7652
> total 4
> -rw-r--r-- 1 edginfo edginfo 0 Aug 20 00:22
> lcg-info-dynamic-ce.ldif.3383
> -rw-r--r-- 1 edginfo edginfo 843 Aug 20 00:22
> lcg-info-dynamic-software.ldif.7652
>
> I have not yet tracked down what is the reason of this truncation. If you
> have any idea I would very appreciate it.
The info provider may be run by 2 independent entities:
by the slapd, running as "edginfo", and by rgma-gin, running as "rgma".
I suspect the rgma-gin process has an incomplete environment, causing
qstat/pbsnodes/... to fail, which leads to an empty output file.
|