Print

Print


Carlos Borrego Iglesias wrote:

> Some more information... the service which is causing the problem is 
> lcg-mon-job-status. If I restart it, it works, but only for some time...

Yes, there is a bug in lcg-mon-job-status.  For now, just switch it off.

> Here's the log for the service:
> 
> [root@rb01 root]# tail /opt/lcg/var/lcg-mon-job-status.log
> 2005-08-05 10:30:00,835: [ERROR] Error creating primary producer
> 2005-08-05 10:30:05,832: [ERROR] Producer died. Trying to start a new 
> one...
> 2005-08-05 10:30:05,834: [ERROR] Failed to insert tuples! Retrying in 5 
> seconds...
> 2005-08-05 10:30:05,835: [ERROR] Error creating primary producer
> 2005-08-05 10:30:10,832: [ERROR] Producer died. Trying to start a new 
> one...
> 2005-08-05 10:30:10,834: [ERROR] Failed to insert tuples! Retrying in 5 
> seconds...
> 2005-08-05 10:30:10,835: [ERROR] Error creating primary producer
> 2005-08-05 10:30:15,832: [ERROR] Producer died. Trying to start a new 
> one...
> 2005-08-05 10:30:15,834: [ERROR] Failed to insert tuples! Retrying in 5 
> seconds...
> 2005-08-05 10:30:15,835: [ERROR] Error creating primary producer
> 
> 
> Any ideas?
> Thanks
> Carlos
> 
> ==========================================================================
> Carlos Borrego Iglesias                 PIC (Port d'Informació Científica)
> tel: +34 93 581 3308                    Campus UAB - Edifici D
> e-mail: [log in to unmask]           E-08193 Bellaterra
> ==========================================================================
> 
> On Fri, 5 Aug 2005, Carlos Borrego Iglesias wrote:
> 
>> Hi all,
>> After updating our RB to 2.6 al services seemed to work fine, but 
>> after some time jobs can't be registered. If I submit a job I get this 
>> error:
>>
>> [[log in to unmask]]#edg-job-submit  --resource 
>> ifaece01.pic.es:2119/jobmanager-lcgpbs-dteam  --vo dteam  testJob.jdl
>>
>> Selected Virtual Organisation name (from --vo option): dteam
>> Connecting to host rb01.pic.es, port 7772
>> Logging to host rb01.pic.es, port 9002
>> **** Error: API_NATIVE_ERROR ****
>> Error while calling the "edg_wll_RegisterJobSync" native api
>> Unable to Register the Job:
>> https://rb01.pic.es:9000/kRqop9BwDW5a0v-pzdiTXQ
>> to the LB logger at: rb01.pic.es:9002
>> Resource temporarily unavailable (Resource temporarily unavailable - 
>> edg_wll_log_proto_client: Error get answer, timeout expired;)
>>
>> If I reconfigure the RB things seem to work again, but after some time 
>> they fail.
>>
>> Has anyone seen this before?
>> Thanks!
>> Carlos
>>
>> ========================================================================== 
>>
>> Carlos Borrego Iglesias                 PIC (Port d'Informació 
>> Científica)
>> tel: +34 93 581 3308                    Campus UAB - Edifici D
>> e-mail: [log in to unmask]           E-08193 Bellaterra
>> ==========================================================================