Print

Print


Good day to all!

Here at GSI we finished to upgrade to LCG 2.3 and finally we get UI on
Debian and WN's on our LSF farm working (not 100%, but simple job submission
works using edg and globus).

During the work on LCG-LSF(gsi) stuff (porting to Debian) I found several
thinks, which I think are simply bugs. But I am not sure about that, so I
decided to share this info between the LCG-ROLLOUT community to may be get
some comments.

Bug #1

"lcg-info-dynamic-lsf":
on the very end of the script, a developer seems wrongly introduced the
variable "GlueCEInfoFreeCPUs":

$output{GlueCEInfoFreeCPUs}=$FreeCPU;

I suspect that it should be called:

$output{GlueCEStateFreeCPUs}=$FreeCPU;



Bug #2

"lcglsf.pm":
On the very beginning of the script /opt/globus/lib/perl/Globus/GRAM/
JobManager/lcglsf.pm there is a big mistake: That was in the script:

BEGIN
{
    $mpirun = 'no';
    $bsub   = '/LSF/lsf-5.1/bin/bsub';
    $bjobs  = '/LSF/lsf-5.1/bin/bjobs';
    $bkill  = '/LSF/lsf-5.1/bin/bkill';
    $bacct  = '/LSF/lsf-5.1/bin/bacct';
    $bmod  = '/LSF/lsf-5.1/bin/bmod';
}

This is what I did change and what (I think) it should look like:

BEGIN
{
#$lsf_profile - is LSF prof path
    $lsf_profile = '/LSF/lsf-5.1/conf/profile.lsf';
    $mpirun = 'no';
    $bsub   = ". $lsf_profile && /LSF/lsf-5.1/bin/bsub";
    $bjobs  = ". $lsf_profile && /LSF/lsf-5.1/bin/bjobs";
    $bkill  = ". $lsf_profile && /LSF/lsf-5.1/bin/bkill";
    $bacct  = ". $lsf_profile && /LSF/lsf-5.1/bin/bacct";
    $bmod   = ". $lsf_profile && /LSF/lsf-5.1/bin/bmod";
    $bhist  = ". $lsf_profile && /LSF/lsf-5.1/bin/bhist";
}
This bug was very diffical to track, because there is no log info, which
could point to the path of the problem.
I am not sure that it is valid to every system, but I think it wouldn't hurt
to have commands defined like it is listed above.

So,here are the bugs, what I should do with them?
Should I bring them to the attention of the developers?
How I could bring them to the attention of the developers?

Also, I found a LOT of small bugs and bad regular expression conditions in
the scripts of InfProv and JM, which are not flexible at all and really
depends on an output of commands which are executed by scripts...
This is not good for the big software like LCG and take in account VERY BAD
Logging = Big problems, even when you are trying to install the system on
the environment which is a little bit different than standard (RH + pbs) :)
I DON'T need the LOG when everything is Fine. I would need it when I got a
problem and here LCG log (some in the sys messages, gate keeper, gram) can't
help :( Or perhaps I couldn't find the proper log file... Please, Tell me,
which to use...

I think, may be, later on I will try to rise the question about LSF LCG job
manager and core of JM in the different topic, because it is purely hell
(mostly about design and implementation), no comments inside the code, NO
GOOD logging (when there is a problem it is almost impossible to understand
what is happening and why only by log files! So you are forced to go and
debug the code by yourself!!!) Because logging information mostly useless
and senseless. :(
LSF JM is very depends on the output form of LSF commands (you have one line
which is not expected by JM and you are failed! regular expression
conditions in the JM are very LSF output dependent. Step left, step right
and you are failed! and no info in the log, why?)
I spent complete week to track down the problems. It would make life much
easer if there will be at least comments in the code (IT IS A RULL of a
software development to use a comments) and get some good log.

I do understand what the software development is, so I DO respect the work
of LCG developers, but some things should be done (IMHO), at least comments
and the more useable log.

I am not sure either all of this is a problem or it is just for me seems to
be a problem. In any case I would really love to see some comments on this
topic.

Good luck to you all. And Good luck to LCG DTeam in making our software the
best!

Cheers,

Anar