Hi Arnau...
> Option 1: I had similar problems in our WMS. On a high load, it stopped
> seeing bdii resources and jobs were not able to start. If it's the
> case, you will find some descriptive message in
> workload_manager_events.log. And, for sovling it, we installed
> google_perf_tools (you will find the receipt at WMS known_issues).
>
We are using google_perf_tools already.
grep libtcmalloc.so /opt/glite/etc/glite_wms.conf
RuntimeMalloc = "/usr/lib/libtcmalloc.so";
> Options 2: Have you recently upgraded lb? If yes, ensure
> glite-lb-authz.conf has the correct values.
Nope.
> *You could also install WMSMonitor. Good tool for quick check.
>
Yes I know. Unfortunately, since we are operating two sites over a WAN,
we though of using that tool in WAN mode. Talking with Daniele, the
WMSMonitor uses SNMP version 2 which is not the proper framework to do
it. we are expecting the EMI WMSMonitor release which will use ActiveMQ
>
>> ---*---
> Maarten senme this script wich must be in cron:
> # cat /usr/local/sbin/clean_condor_jobs.sh
> #!/bin/bash
>
>
> CONDOR_CRAP=`/opt/condor-c/bin/condor_q -hold | grep glite | awk '{print $1}'`
>
>
> for JOB_ID in $CONDOR_CRAP
> do
> echo "Removing job: " $JOB_ID
> /opt/condor-c/bin/condor_rm $JOB_ID
> # sleep 2
> /opt/condor-c/bin/condor_rm -forcex $JOB_ID
> done
>
condor CRAP is a good name :-) Any special frequency to run it?
Cheers
Goncalo
|