> Just wondering what the collective wisdom on the random failure of rep
> management tests currently is? All of our Grid-Ireland sites use
> lcg-bdii.cern.ch as their BDII for dteam jobs and yet we regularly see the
> dreaded invalid argument (*) error for various rep mgmt tests. We are
> still running 2.4.0 (upgrading by Christmas, honest) -- is it a recurrence
> of the info sys poisoning reported previously? Although looking at 2.6.0
> sites in UKI they seem to fail intermittently as well.
>
> It is worth sorting this out as it seems that "functional sites" are
> currently failing the "site functional" tests because of infrastructure
> problems elsewhere. Much as we all love processing GGUS tickets, it would
> be nice if we could reduce them somewhat ...
>
> Stephen
>
> (*) Please, please tell me that one day this will actually report "Problem
> with information system" or something useful like that rather than
> "Invalid argument".
Just to follow up to this, looking through the BDII code, my feeling is that the problem we are
running into is a race condition on rebuilding the ldap databases.
For example, we have three slapd instances running, each generating/removing
it's database continuously after a specified time period. It is possible that at some particular instance,
let's say with port forwarding going to port 2171, the call to the
perl code in ((lcg-)bdii-update:
system("cd $bdii_dir/var/$bdii_port_write && rm -f dn2id.* id2entry.* nextid.*");
is causing ldap queries to return no information because the above command had just removed
any valid data? The lack of correct details in the infosys should cause lcg-* commands to
work incorrectly.
Does this sould like a plausible scenario?
regards,
John Walsh
---------------------------------------------------------------------------------------------------------
Dept of Computer Science, Trinity College, Dublin 2
|