Hi Don...
Thanks. We are also in touch with Daniele, so that we can have a first
flavour :-)
Cheers
Goncalo
Quoting dongiovanni <[log in to unmask]>:
> Hi Gonçalo,
> as it was mentioned in the thread, new WMSMonitor version will work
> with ActiveMQ (other than other improvement like usage of LB api +
> monitor cream + handle asynchronous data collection).
> We're currently testing sensors with preview instances of emi WMS/LB
> and EGI testing activemq broker. Then we'll made sensors available to
> be installed on new emi-wms/lb for volunteers sites or whoever
> interested (a tar + a cron, triggering 15min data production). Data
> will be published on our central server. Then wmsmonitor server rpm
> will also be distributed.
> If you're interested in preview phase I will notify you.
> Cheers
> Danilo
>
>
>
> On 06/14/2011 01:40 PM, Gonçalo Borges wrote:
>> Hi Tiziana...
>>
>>>
>>> is this cron script described as workaround or the problem
>>> documented in any way? if not we should request documentation to
>>> be fixed accordingly. We can do this with a GGUS ticket (we don't
>>> need an official EGI requirement for this).
>>>
>>
>> Indeed I know that condor_q sometimes has a lot of held entries,
>> and normally, I remove them by hand. They are normally caused by
>> jobs which entered in a very strange state. For example, in one of
>> these situations, these held entries in condor_q were production
>> jobs from an Auger user which were sent to a site with hardware
>> problems. Jobs failed but remained in condor_q forever.
>>
>> I'm not WMS expert but these situations indeed happens from time to
>> time, and at least a clean guideline on how to deal with them is
>> appreciated.
>>
>>
>>
>>> As side issue, I know IGI has a geographically distributed pool of
>>> WMS instances controlled by WMSmonitor. Why do you think SNMP is
>>> not a good (interim) solution? probably it's just a matter of
>>> fixing the respective firewalls at the sites.
>>
>> AFAIK, WMSMon uses SNMPv2 which means that does not support
>> encryption. To set it up in a WAN mode, you have to exchange
>> messages (with community passwords inside) over the internet, and
>> therefore, you do not want to expose that kind of (unencrypted)
>> traffic to the world which explicit information about your
>> services. Therefore, it is only suitable to be implemented over a
>> LAN. If this is being done in IGI (using SNMPv2) I would have to
>> understand the used network topology.
>>
>> Cheers
>> Goncalo
>>
>>
>>> let me know
>>>
>>> Thanks Tiziana
>>>
>>> On 14/06/2011 12:55, Gonçalo Borges wrote:
>>>> After implementing the cron from Arnau and Marteen for the deletion of
>>>> the CONDOR CRAB, the number of requests registered in the
>>>>
>>>> /var/glite/jobcontrol/jobdir/
>>>>
>>>> decreased tremendously.
>>>>
>>>> Cheers
>>>> Goncalo
>>>>
>>>> On 06/14/2011 11:18 AM, Gonçalo Borges wrote:
>>>>> Hi Arnau...
>>>>>
>>>>>> Option 1: I had similar problems in our WMS. On a high load, it stopped
>>>>>> seeing bdii resources and jobs were not able to start. If it's the
>>>>>> case, you will find some descriptive message in
>>>>>> workload_manager_events.log. And, for sovling it, we installed
>>>>>> google_perf_tools (you will find the receipt at WMS known_issues).
>>>>>>
>>>>>
>>>>> We are using google_perf_tools already.
>>>>>
>>>>> grep libtcmalloc.so /opt/glite/etc/glite_wms.conf
>>>>> RuntimeMalloc = "/usr/lib/libtcmalloc.so";
>>>>>
>>>>>
>>>>>> Options 2: Have you recently upgraded lb? If yes, ensure
>>>>>> glite-lb-authz.conf has the correct values.
>>>>>
>>>>> Nope.
>>>>>
>>>>>> *You could also install WMSMonitor. Good tool for quick check.
>>>>>>
>>>>>
>>>>> Yes I know. Unfortunately, since we are operating two sites over a
>>>>> WAN, we though of using that tool in WAN mode. Talking with Daniele,
>>>>> the WMSMonitor uses SNMP version 2 which is not the proper framework
>>>>> to do it. we are expecting the EMI WMSMonitor release which will use
>>>>> ActiveMQ
>>>>>
>>>>>>
>>>>>>> ---*---
>>>>>> Maarten senme this script wich must be in cron:
>>>>>> # cat /usr/local/sbin/clean_condor_jobs.sh
>>>>>> #!/bin/bash
>>>>>>
>>>>>>
>>>>>> CONDOR_CRAP=`/opt/condor-c/bin/condor_q -hold | grep glite | awk
>>>>>> '{print $1}'`
>>>>>>
>>>>>>
>>>>>> for JOB_ID in $CONDOR_CRAP
>>>>>> do
>>>>>> echo "Removing job: " $JOB_ID
>>>>>> /opt/condor-c/bin/condor_rm $JOB_ID
>>>>>> # sleep 2
>>>>>> /opt/condor-c/bin/condor_rm -forcex $JOB_ID
>>>>>> done
>>>>>>
>>>>>
>>>>> condor CRAP is a good name :-) Any special frequency to run it?
>>>>>
>>>>> Cheers
>>>>> Goncalo
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>
>
>
|