Hi,
Hier some more output of log file wmsproxy.log:
It looks like it register the job with some id.. but then fails on
lb-serv whith 'timeout'. But befor it gives some output, that repeats
about 10 times, see log output:
---------------
14 May, 13:56:39 -I- PID: 16311 - "wmproxy::main": Resetting signals handler
14 May, 13:56:39 -I- PID: 16339 -
"wmpgsoapoperations::ns1__jobRegister": jobRegister operation called
14 May, 13:56:39 -I- PID: 16339 -
"wmpgsoapoperations::ns1__jobRegister": Setting signals handler
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo":
-------------------------------- Incoming Request
--------------------------------
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo": Remote
Host IP: 10.33.1.124:41348 - Remote Host Name: Not Available
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo": Remote
CLIENT S DN: /O=GermanGrid/OU=FZK/CN=Dimitri Nilsen/CN=proxy
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo": Remote
GRST CRED: VOMS 1210762874 1210806074 0 /dteam/Role=NULL/Capability=NULL
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo":
Service GRST PROXY LIMIT: 6
14 May, 13:56:39 -I- PID: 16339 - "wmpcommon::logRemoteHostInfo":
----------------------------------------------------------------------------------
14 May, 13:56:40 -S- PID: 16339 - "wmputils::doExecv": Child failure,
exit code: 7424
14 May, 13:56:40 -E- PID: 16339 - "wmpcommon::callLoadScriptFile":
Unable to execute load script file:
Illegal seek
14 May, 13:56:40 -E- PID: 16339 - "wmpcommon::callLoadScriptFile": Error
code: 29
14 May, 13:56:40 -I- PID: 16339 - "wmpcoreoperations::jobRegister":
Delegation ID: lqBqIFNPO4ttQgh3rkEipw
14 May, 13:56:40 -I- PID: 16339 - "wmpcoreoperations::jobRegister":
Authorizing user...
14 May, 13:56:40 -I- PID: 16339 - "WMPAuthorizer::checkGaclUserAuthZ":
Checking VOMS proxy...
14 May, 13:56:40 -I- PID: 16339 - "WMPAuthorizer::checkGaclUserAuthZ":
fqan=/dteam/Role=NULL/Capability=NULL
14 May, 13:56:40 -I- PID: 16339 - "wmpcommon::getType": Type: job
14 May, 13:56:40 -I- PID: 16339 - "wmpcoreoperations::regist JOB":
Register for job id:
https://pps-rb-fzk.gridka.de:9000/ia5Lhju0wwfRS4AQlcypfA
14 May, 13:56:42 -I- PID: 20924 - "wmproxy::main": ------- Starting
Server Instance -------
14 May, 13:56:42 -E- PID: 20924 - "wmpoperations::checkGlobusVersion":
GLOBUS_LOCATION variable not found, setting it to /opt/globus
14 May, 13:56:42 -E- PID: 20924 - "wmpoperations::checkGlobusVersion":
globus-version binary not found
14 May, 13:56:42 -E- PID: 20924 - "wmpoperations::checkGlobusVersion":
Assuming globus version is less than 3.0.2
14 May, 13:56:42 -I- PID: 20924 - "wmproxy::main": Running as a FastCGI
program
14 May, 13:56:42 -I- PID: 20924 - "wmproxy::main": Entering the FastCGI
accept loop...
14 May, 13:56:42 -I- PID: 20924 - "wmproxy::main":
----------------------------------------
14 May, 13:56:42 -I- PID: 20924 - "wmproxy::main": Resetting signals handler
14 May, 13:56:48 -I- PID: 20935 - "wmproxy::main": ------- Starting
Server Instance -------
14 May, 13:56:48 -E- PID: 20935 - "wmpoperations::checkGlobusVersion":
GLOBUS_LOCATION variable not found, setting it to /opt/globus
14 May, 13:56:48 -E- PID: 20935 - "wmpoperations::checkGlobusVersion":
globus-version binary not found
14 May, 13:56:48 -E- PID: 20935 - "wmpoperations::checkGlobusVersion":
Assuming globus version is less than 3.0.2
14 May, 13:56:48 -I- PID: 20935 - "wmproxy::main": Running as a FastCGI
program
14 May, 13:56:48 -I- PID: 20935 - "wmproxy::main": Entering the FastCGI
accept loop...
14 May, 13:56:48 -I- PID: 20935 - "wmproxy::main":
----------------------------------------
14 May, 13:56:48 -I- PID: 20935 - "wmproxy::main": Resetting signals handler
14 May, 13:56:54 -I- PID: 20945 - "wmproxy::main": ------- Starting
Server Instance -------
14 May, 13:56:54 -E- PID: 20945 - "wmpoperations::checkGlobusVersion":
GLOBUS_LOCATION variable not found, setting it to /opt/globus
14 May, 13:56:54 -E- PID: 20945 - "wmpoperations::checkGlobusVersion":
globus-version binary not found
14 May, 13:56:54 -E- PID: 20945 - "wmpoperations::checkGlobusVersion":
Assuming globus version is less than 3.0.2
14 May, 13:56:54 -I- PID: 20945 - "wmproxy::main": Running as a FastCGI
program
14 May, 13:56:54 -I- PID: 20945 - "wmproxy::main": Entering the FastCGI
accept loop...
14 May, 13:56:54 -I- PID: 20945 - "wmproxy::main":
----------------------------------------
.!!! .. { This part repaets several times.. ca. 10 times.. }
14 May, 13:58:09 -I- PID: 21022 - "wmproxy::main": ------- Starting
Server Instance -------
14 May, 13:58:09 -E- PID: 21022 - "wmpoperations::checkGlobusVersion":
GLOBUS_LOCATION variable not found, setting it to /opt/globus
14 May, 13:58:09 -E- PID: 21022 - "wmpoperations::checkGlobusVersion":
globus-version binary not found
14 May, 13:58:09 -E- PID: 21022 - "wmpoperations::checkGlobusVersion":
Assuming globus version is less than 3.0.2
14 May, 13:58:09 -I- PID: 21022 - "wmproxy::main": Running as a FastCGI
program
14 May, 13:58:09 -I- PID: 21022 - "wmproxy::main": Entering the FastCGI
accept loop...
14 May, 13:58:09 -I- PID: 21022 - "wmproxy::main":
----------------------------------------
14 May, 13:58:09 -I- PID: 21022 - "wmproxy::main": Resetting signals handler
14 May, 13:58:41 -S- PID: 16339 - "WMPEventlogger::registerJob": LBProxy
is enabled
Register job failed
edg_wll_RegisterJobProxy
Exit code: 1417
LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
(edg_wll_RegisterJobProxy(): unable to register with bkserver
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;;
edg_wll_DoLogEventDirect(): edg_wll_log_direct_connect error
Transport endpoint is not connected;; edg_wll_gss_connect();; GSS Error:
timeout expired;)
14 May, 14:00:46 -S- PID: 16339 - "WMPEventlogger::registerJob": LBProxy
is enabled
Register job failed
edg_wll_RegisterJobProxy
Exit code: 1417
LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
(edg_wll_RegisterJobProxy(): unable to register with bkserver
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;;
edg_wll_DoLogEventDirect(): edg_wll_log_direct_connect error
Transport endpoint is not connected;; edg_wll_gss_connect();; GSS Error:
timeout expired;)
--------------
Cheers
Dimitri
Nilsen Dimitri wrote:
> Hi,
> all ports are open. The script returns the following output:
> [root@pps-rb-fzk root]# ./check
> PID TTY TIME CMD
> 28051 ? 00:00:00 condor_collecto
> 28020 ? 00:00:00 condor_master
> 28060 ? 00:00:00 condor_negotiat
> 28052 ? 00:00:00 condor_schedd
> 29021 ? 00:00:00 glite-lb-bkserv
> 28567 ? 00:00:00 glite-lb-interl
> 28529 ? 00:00:00 glite-lb-logd
> 29091 ? 00:00:00 glite-lb-notif-
> 28423 ? 00:00:00 glite-lb-proxy
> 28238 ? 00:00:00 glite-proxy-ren
> 27980 ? 00:00:00 glite-wms-job_c
> 28053 ? 00:00:00 glite-wms-log_m
> 28141 ? 00:00:00 glite-wms-workl
> 16123 ? 00:00:00 glite_wms_wmpro
> 28253 ? 00:00:00 httpd
> 28294 ? 00:00:00 perl
>
> and it seems to be running on the right ports.
>
> [root@pps-rb-fzk root]# netstat -nlp | grep 900
> tcp 0 0 0.0.0.0:9000
> 0.0.0.0:* LISTEN 29021/glite-lb-bkse
> tcp 0 0 0.0.0.0:9001
> 0.0.0.0:* LISTEN 29021/glite-lb-bkse
> tcp 0 0 0.0.0.0:9002
> 0.0.0.0:* LISTEN 28529/glite-lb-logd
> tcp 0 0 0.0.0.0:9003
> 0.0.0.0:* LISTEN 29021/glite-lb-bkse
>
> Cheers
> Dimitri
>
>
> Maarten Litmaath wrote:
>
>> On Fri, 9 May 2008, Nilsen Dimitri wrote:
>>
>>
>>
>>> I get an error on my WMS by glite-wms-job-submit:
>>>
>>> On the server:
>>> ----------
>>> 09 May, 17:13:42 -I- PID: 10945 - "wmpcoreoperations::jobRegister":
>>> Authorizing user...
>>> 09 May, 17:13:42 -I- PID: 10945 -
>>> "WMPAuthorizer::checkGaclUserAuthZ": Checking VOMS proxy...
>>> 09 May, 17:13:42 -I- PID: 10945 -
>>> "WMPAuthorizer::checkGaclUserAuthZ":
>>> fqan=/dteam/Role=NULL/Capability=NULL
>>> 09 May, 17:13:42 -I- PID: 10945 - "wmpcommon::getType": Type: job
>>> 09 May, 17:13:42 -I- PID: 10945 - "wmpcoreoperations::regist JOB":
>>> Register for job id:
>>> https://pps-rb-fzk.gridka.de:9000/UH8DiRvyMzHgUKAGGHy1xA
>>> 09 May, 17:13:43 -S- PID: 10945 - "WMPEventlogger::registerJob":
>>> LBProxy is enabled
>>> Register job failed
>>> edg_wll_RegisterJobProxy
>>> Exit code: 1417
>>> LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
>>> (edg_wll_RegisterJobProxy(): unable to register with bkserver
>>> LB server (bkserver,lbproxy) store protocol error;; Logging library
>>> ERROR:
>>> LB server (bkserver,lbproxy) store protocol error;;
>>> edg_wll_DoLogEventDirect(): edg_wll_log_direct_connect error
>>> Transport endpoint is not connected;; edg_wll_gss_connect();; System
>>> Error: Connection refused)
>>>
>>
>>
>> It suggests that glite-lb-bkserverd either is not running or one of its
>> ports (9000, 9001, 9003) is blocked by a firewall.
>>
>> The attached script reports if anything is missing.
>>
>>
>> ------------------------------------------------------------------------
>>
>> #!/bin/sh
>>
>> export LANG=C
>>
>> ps -u glite | sort -k 4 | uniq -f 3 | perl -pe '
>> BEGIN {
>> @daemons = (
>> "glite-lb-bkserv",
>> "glite-lb-interl",
>> "glite-lb-logd",
>> "glite-lb-notif-",
>> );
>> }
>>
>> for my $d (@daemons) {
>> if (/ $d/) {
>> $seen{$d}++;
>> break;
>> }
>> }
>>
>> END {
>> for my $d (@daemons) {
>> print "\nABSENT: $d\n" unless $seen{$d};
>> }
>> print "\n";
>> }
>> '
>>
>>
>>
>
>
--
Dimitri Nilsen, Dipl.-Ing(FH)
Forschungszentrum Karlsruhe
Institut f. Wissenschaftliches Rechnen (IWR)
Postfach 3640
76021 Karlsruhe
Tel.: +49 7247 82-8607
Fax.: +49 7247 82-4972
Email: [log in to unmask]
|