Dear David,
Thanks a lot for the "clue", yes the missing variables was the
"culprit"... for unknown reason the variables were missing from the
shell environment.... So I manually added ".
/etc/profile.d/grid-env.sh" to the /usr/bin/dtomcat5 script and
restarted the tomcat service.. and now the cream/blahpd works
properly...
Regards
On Tue, Nov 10, 2015 at 9:20 PM, Muhammad Farhan SJAUGI
<[log in to unmask]> wrote:
> Dear David,
>
> Yes, you could be right, I found out the variables for BLAH is missing:
>
> [root@khaldun tmp]# tr '\0' '\n' < /proc/17186/environ | grep BL
> [root@khaldun tmp]#
>
> If I compare with another working cluster:
>
> [root@haitham ~]# tr '\0' '\n' < /proc/31721/environ | grep BL
> BLPARSER_CONFIG_LOCATION=/etc/blparser.conf
> BLAHPD_CONFIG_LOCATION=/etc/blah.config
> BLAHPD_LOCATION=/usr
> [root@haitham ~]#
>
> So how can I add the missing variables to the affected cluster?
>
> Regards
>
> On Tue, Nov 10, 2015 at 9:06 PM, David Rebatto <[log in to unmask]> wrote:
>> Hi Muhammad,
>> I add the blah mailing list in cc.
>>
>> Il giorno 10/nov/2015, alle ore 12:52, Muhammad Farhan SJAUGI
>> <[log in to unmask]> ha scritto:
>>
>> Hi,
>>
>> I tried to narrow down the problem, it seems that blahpd has some
>> difficulty to execute the /usr/libexec/pbs_status.sh file if the
>> blahpd is called by the tomcat webserver.
>>
>> However if the blahpd command is called by either tomcat/root user
>> from the terminal, the command has no problem to give the correct
>> result. I was suspecting that there were
>>
>> a racing condition where the blahpd tried to get the lrms info from
>> cream (via BLPClient), however since the cream is not ready yet the to
>> accept the query, it just simply "reject" the query
>>
>> hence blahpd is also giving error message. But this shouldn't be the
>> case since it will keep trying to make query to cream…
>>
>>
>> The query is the other way round, i.e. cream asks blahpd about its
>> configuration.
>> Blahpd, on turn, gets that information from its configuration file, so
>> there’s no external query involved.
>> It looks like the blahpd started by cream is not reading the proper
>> configuration file.
>> You can dump one of the blahpd processes’ environment, e.g. with this
>> command
>> $ tr '\0' '\n’ < /proc/<blahpd's pid>/environ
>> and look for wrong paths in blahpd and glite related variables
>> (BLAHPD_CONFIG_LOCATION above all).
>> Look also for missing variables, as they can have quite obsolete defaults.
>>
>> Cheers,
>> David
>>
>>
>>
>>
>> Regards
>>
>> On Tue, Nov 10, 2015 at 6:59 PM, Muhammad Farhan SJAUGI
>> <[log in to unmask]> wrote:
>>
>> Dear Jeff,
>>
>> The blahpd run as user tomcat...also user tomcat has operator
>> privileges on the torque server:
>>
>> set server operators += [log in to unmask]
>>
>> However, when I tried to run/execute the blahpd command as user
>> tomcat, it seems return correct result:
>>
>> [root@khaldun ~]# su - tomcat
>> -sh-4.3$ blahpd
>> $GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
>> BLAH_GET_HOSTPORT 0
>> S
>> RESULTS
>> S 1
>> 0 0 pbs/khaldun.biruni.upm.my:56554
>>
>>
>> Regards
>>
>> On Tue, Nov 10, 2015 at 5:14 PM, Jeff Templon <[log in to unmask]> wrote:
>>
>> Hi
>>
>> as which user does blahpd run? I see you ran it as root by hand, but when
>> run normally, which user? and does this user have operator privileges on
>> the torque server?
>>
>> JT
>>
>> On 10 Nov 2015, at 01:25, Muhammad Farhan SJAUGI <[log in to unmask]>
>> wrote:
>>
>> Greetings,
>>
>> I found something interesting.. apparently CREAM didn't get the
>> correct result from BLAH:
>>
>> 10 Nov 2015 00:19:18,366 INFO
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLParserClient -
>> initializeConnection: getting info about BLParser (pbs) from BLAH
>> (retry count=97/100)
>> 10 Nov 2015 00:20:18,368 DEBUG
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLAHExecutor -
>> BLAH_GET_HOSTPORT 0
>> 10 Nov 2015 00:20:19,370 DEBUG
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLAHExecutor -
>> getBlahOutput: S
>> 10 Nov 2015 00:20:20,371 DEBUG
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLAHExecutor -
>> getBlahOutput: S 1
>> 10 Nov 2015 00:20:20,371 DEBUG
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLAHExecutor -
>> getBlahOutput: 0 0 pbs/Error\ reading\ host:port
>>
>> However, when I tried to query BLAH manually it seems return correct answer:
>>
>> [root@khaldun ~]# blahpd
>> $GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
>> BLAH_GET_HOSTPORT 0
>> S
>> RESULTS
>> S 1
>> 0 0 pbs/khaldun.biruni.upm.my:56554
>>
>> Perhaps this is the main issue?
>>
>> Regards
>>
>> On Tue, Nov 10, 2015 at 7:37 AM, Muhammad Farhan SJAUGI
>> <[log in to unmask]> wrote:
>>
>> Dear Steve,
>>
>> Thank you for your feedback. I can confirm that the new blparser is
>> used instead the old one.
>>
>> I'm wondering how the cream communicate with blparser? is it via
>> socket or merely call the programming api?
>>
>> Regards
>>
>> On Mon, Nov 9, 2015 at 10:13 PM, Stephen Jones <[log in to unmask]>
>> wrote:
>>
>> Hi Muhammad,
>>
>> Here's something to check.
>>
>> http://grid.pd.infn.it/cream/field.php?n=Main.CREAMAndBlparserConfiguration
>>
>> If the "blparser" service is used by the "Old Blah Parser", perhaps you are
>> accidentally starting the wrong parser?
>>
>> Note: I think the "BNotifier" and "BUpdaterPBS" processes belong to the "New
>> Blah Parser". Maybe...
>>
>> So check which parser you are using.
>>
>> Cheers,
>>
>> Steve
>>
>>
>>
>> On 11/08/2015 10:03 AM, Muhammad Farhan SJAUGI wrote:
>>
>>
>> Greetings,
>>
>> One of cluster shows strange behavior... CREAM unable to submit the
>> job to BLAH because the blparser service is not alive:
>>
>> 08 Nov 2015 09:54:06,375 WARN
>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor -
>> submission to BLAH failed [jobId=CREAM524062606; reason=The job cannot
>> be submitted because the blparser service is not alive; retry
>> count=3/3]
>>
>> I can confirm that the blparser service is up:
>>
>> [root@khaldun etc]# ps ax | grep BNotifier
>> 3155 ? Sl 0:00 /usr/libexec/BNotifier
>>
>> [root@khaldun etc]# ps ax | grep BUpdaterPBS
>> 3167 ? S 0:00 /usr/libexec/BUpdaterPBS
>>
>> Also I found from the cream log another info as below (but im not sure
>> whether it is related or not)
>>
>> 08 Nov 2015 09:55:21,782 INFO
>> org.glite.ce.cream.jobmanagement.cmdexecutor.blah.BLParserClient -
>> initializeConnection: getting info about BLParser (pbs) from BLAH
>> (retry count=95/100)
>>
>> I have tried restarting the service and even re-run yaim, both were
>> not able to solve the problem...
>>
>> Is there anyone can help me to fix this problem?
>>
>> Regards
>>
>>
>>
>> --
>> Steve Jones [log in to unmask]
>> Grid System Administrator office: 220
>> High Energy Physics Division tel (int): 43396
>> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
>> University of Liverpool http://www.liv.ac.uk/physics/hep/
>>
>>
>>
>>
>> --
>> Muhammad Farhan Sjaugi, S.Kom. M.Sc
>>
>> Technical Coordinator
>> Academic Grid Malaysia
>> c/o UNITEN
>> email: [log in to unmask]
>>
>> Lecturer/Programmer
>> Perdana University Centre for Bioinformatics
>> email: [log in to unmask]
>>
>>
>>
>>
>> --
>> Muhammad Farhan Sjaugi, S.Kom. M.Sc
>>
>> Technical Coordinator
>> Academic Grid Malaysia
>> c/o UNITEN
>> email: [log in to unmask]
>>
>> Lecturer/Programmer
>> Perdana University Centre for Bioinformatics
>> email: [log in to unmask]
>>
>>
>>
>>
>> --
>> Muhammad Farhan Sjaugi, S.Kom. M.Sc
>>
>> Technical Coordinator
>> Academic Grid Malaysia
>> c/o UNITEN
>> email: [log in to unmask]
>>
>> Lecturer/Programmer
>> Perdana University Centre for Bioinformatics
>> email: [log in to unmask]
>>
>>
>>
>>
>> --
>> Muhammad Farhan Sjaugi, S.Kom. M.Sc
>>
>> Technical Coordinator
>> Academic Grid Malaysia
>> c/o UNITEN
>> email: [log in to unmask]
>>
>> Lecturer/Programmer
>> Perdana University Centre for Bioinformatics
>> email: [log in to unmask]
>>
>>
>> --
>> David Rebatto
>> I.N.F.N. - Sezione di Milano
>> Via Celoria, 16 - 20133 Milano ITALY
>> tel: +39 02503.17623 e-mail: [log in to unmask]
>> URL: http://www.mi.infn.it/~rebatto
>>
>> "Computer science is not about computers any more than
>> astronomy is about telescopes." -- Edsger W. Dijkstra
>>
>
>
>
> --
> Muhammad Farhan Sjaugi, S.Kom. M.Sc
>
> Technical Coordinator
> Academic Grid Malaysia
> c/o UNITEN
> email: [log in to unmask]
>
> Lecturer/Programmer
> Perdana University Centre for Bioinformatics
> email: [log in to unmask]
--
Muhammad Farhan Sjaugi, S.Kom. M.Sc
Technical Coordinator
Academic Grid Malaysia
c/o UNITEN
email: [log in to unmask]
Lecturer/Programmer
Perdana University Centre for Bioinformatics
email: [log in to unmask]
|