Print

Print


Hi

We recently had a similar error at Glasgow on all CE's.

cat globus-tmp.node084.19732.4
/opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server
(bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent():
edg_wll_log_connect error
Transport endpoint is not connected;; edg_wll_gss_connect();; System Error:
Connection refused)

We were also noticing a  'cannot download .BrokerInfo from' error.  Once we
tracked what was wrong we noticed that the job was running using our
specified EDG_WL_SCRATCH of /tmp but not adding a wms job string directory
so it tried to run the job directly from /tmp.  This caused problems when
there were multiple WMS jobs on a node when it tried to globus-url-copy and
got a Permission Denied.

It looks like the latest WMS upgrade to the job wrapper has removed the
following:

#if [ ${__job_type} -eq 0 -o ${__job_type} -eq 3 ]; then # normal or
interactive
  newdir="${__jobid_to_filename}"
  mkdir ${newdir}
  cd ${newdir}
#elif [ ${__job_type} -eq 1 -o ${__job_type} -eq 2 ]; then # MPI (LSF or
PBS)
#fi

We had to make changes to cp_1.sh file as referenced by the
GLITE_LOCAL_CUSTOMIZATION_DIR variable to add a unqiue directory to
EDG_WL_SCRATCH to avoid the collisions from different WMS jobs.

Cheers,

Dug

2009/9/2 Maarten Litmaath <[log in to unmask]>

> Bonjour Emmanuel,
>
>  We have a problem recently with one of our CE clrlcgce01.in2p3.fr, ops
>> SAM tests are stalling on our WN and we get this error :
>>
>> # more globus-tmp.clrwn221.18684.4
>> /opt/glite/bin/glite-lb-logevent: edg_wll_LogEvent*(): LB server
>> (bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
>> LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
>> LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEvent():
>> edg_wll_log_connect error
>> Transport endpoint is not connected;; edg_wll_gss_connect();; System
>> Error: No route to host)
>>
>
> Does your CE allow your WNs to connect to port 9002 (locallogger)?
>
> Anyway, that error should not explain the SAM failures: log messages sent
> by
> the job wrapper via the CE are not critical as far as the WMS is concerned.
>
> The latest error was this:
>
> Cannot download testjob.tgz from gsiftp://
> wms208.cern.ch:2811/var/glite/SandboxDir/...
>
> It appears your WN cannot make outbound connections...
>



-- 
ScotGrid, Room 481, Kelvin Building, University of Glasgow
tel: +44(0)141 330 6439