Hi Goncalo
is this cron script described as workaround or the problem documented in
any way? if not we should request documentation to be fixed accordingly.
We can do this with a GGUS ticket (we don't need an official EGI
requirement for this).
As side issue, I know IGI has a geographically distributed pool of WMS
instances controlled by WMSmonitor. Why do you think SNMP is not a good
(interim) solution? probably it's just a matter of fixing the respective
firewalls at the sites.
let me know
Thanks Tiziana
On 14/06/2011 12:55, Gonçalo Borges wrote:
> After implementing the cron from Arnau and Marteen for the deletion of
> the CONDOR CRAB, the number of requests registered in the
>
> /var/glite/jobcontrol/jobdir/
>
> decreased tremendously.
>
> Cheers
> Goncalo
>
> On 06/14/2011 11:18 AM, Gonçalo Borges wrote:
>> Hi Arnau...
>>
>>> Option 1: I had similar problems in our WMS. On a high load, it stopped
>>> seeing bdii resources and jobs were not able to start. If it's the
>>> case, you will find some descriptive message in
>>> workload_manager_events.log. And, for sovling it, we installed
>>> google_perf_tools (you will find the receipt at WMS known_issues).
>>>
>>
>> We are using google_perf_tools already.
>>
>> grep libtcmalloc.so /opt/glite/etc/glite_wms.conf
>> RuntimeMalloc = "/usr/lib/libtcmalloc.so";
>>
>>
>>> Options 2: Have you recently upgraded lb? If yes, ensure
>>> glite-lb-authz.conf has the correct values.
>>
>> Nope.
>>
>>> *You could also install WMSMonitor. Good tool for quick check.
>>>
>>
>> Yes I know. Unfortunately, since we are operating two sites over a
>> WAN, we though of using that tool in WAN mode. Talking with Daniele,
>> the WMSMonitor uses SNMP version 2 which is not the proper framework
>> to do it. we are expecting the EMI WMSMonitor release which will use
>> ActiveMQ
>>
>>>
>>>> ---*---
>>> Maarten senme this script wich must be in cron:
>>> # cat /usr/local/sbin/clean_condor_jobs.sh
>>> #!/bin/bash
>>>
>>>
>>> CONDOR_CRAP=`/opt/condor-c/bin/condor_q -hold | grep glite | awk
>>> '{print $1}'`
>>>
>>>
>>> for JOB_ID in $CONDOR_CRAP
>>> do
>>> echo "Removing job: " $JOB_ID
>>> /opt/condor-c/bin/condor_rm $JOB_ID
>>> # sleep 2
>>> /opt/condor-c/bin/condor_rm -forcex $JOB_ID
>>> done
>>>
>>
>> condor CRAP is a good name :-) Any special frequency to run it?
>>
>> Cheers
>> Goncalo
>>
>>
>
>
|