After implementing the cron from Arnau and Marteen for the deletion of
the CONDOR CRAB, the number of requests registered in the
/var/glite/jobcontrol/jobdir/
decreased tremendously.
Cheers
Goncalo
On 06/14/2011 11:18 AM, Gonçalo Borges wrote:
> Hi Arnau...
>
>> Option 1: I had similar problems in our WMS. On a high load, it stopped
>> seeing bdii resources and jobs were not able to start. If it's the
>> case, you will find some descriptive message in
>> workload_manager_events.log. And, for sovling it, we installed
>> google_perf_tools (you will find the receipt at WMS known_issues).
>>
>
> We are using google_perf_tools already.
>
> grep libtcmalloc.so /opt/glite/etc/glite_wms.conf
> RuntimeMalloc = "/usr/lib/libtcmalloc.so";
>
>
>> Options 2: Have you recently upgraded lb? If yes, ensure
>> glite-lb-authz.conf has the correct values.
>
> Nope.
>
>> *You could also install WMSMonitor. Good tool for quick check.
>>
>
> Yes I know. Unfortunately, since we are operating two sites over a
> WAN, we though of using that tool in WAN mode. Talking with Daniele,
> the WMSMonitor uses SNMP version 2 which is not the proper framework
> to do it. we are expecting the EMI WMSMonitor release which will use
> ActiveMQ
>
>>
>>> ---*---
>> Maarten senme this script wich must be in cron:
>> # cat /usr/local/sbin/clean_condor_jobs.sh
>> #!/bin/bash
>>
>>
>> CONDOR_CRAP=`/opt/condor-c/bin/condor_q -hold | grep glite | awk
>> '{print $1}'`
>>
>>
>> for JOB_ID in $CONDOR_CRAP
>> do
>> echo "Removing job: " $JOB_ID
>> /opt/condor-c/bin/condor_rm $JOB_ID
>> # sleep 2
>> /opt/condor-c/bin/condor_rm -forcex $JOB_ID
>> done
>>
>
> condor CRAP is a good name :-) Any special frequency to run it?
>
> Cheers
> Goncalo
>
>
|