Hi Stev,
may we try the new MAUI rpms. Where are they?
Thanx
Andreas
On Wed, 10 Dec 2008, Steve Traylen wrote:
> On Wed, Dec 10, 2008 at 9:29 AM, Michel Jouvin <[log in to unmask]> wrote:
>> Sorry, I missed the message just before the one I answered. The RPMs built
>> by Steve have many limits increased compared to default ones and is normally
>> suitable for large/very large configurations.
>
> The increased limits are described here.
>
> https://savannah.cern.ch/bugs/?33484
>
> and are in this patch going through certification.
>
> https://savannah.cern.ch/patch/?2517
>
> Steve
>>
>> Michel
>>
>> --On mercredi 10 décembre 2008 09:27 +0100 Michel Jouvin
>> <[log in to unmask]> wrote:
>>
>>> Jeff,
>>>
>>> I had no time to read the whole thread but just check ! AFAIK, this is
>>> (this was?) the last snapshot released by ClusterResources but Steve may
>>> answer more precisely.
>>>
>>> Michel
>>>
>>> --On mercredi 10 décembre 2008 09:16 +0100 Jeff Templon
>>> <[log in to unmask]> wrote:
>>>
>>>> Hi,
>>>>
>>>> are these RPMs the ones I asked for in my previous message?? ;-)
>>>>
>>>> JT
>>>>
>>>> On 10 Dec 2008, at 08:07, Michel Jouvin wrote:
>>>>
>>>>> Yves,
>>>>>
>>>>> We experienced such a behaviour (it was with reservations but I
>>>>> suspect it may be the same pb). You may give a try to the last
>>>>> snapshot built by Steve Traylen (but not officially released as part
>>>>> of gLite):
>>>>>
>>>>> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/torque/2.3.
>>>>> 0-2-2/
>>>>> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/maui/3.2.6p
>>>>> 20-10/
>>>>>
>>>>> We run it for a couple of month at GRIF and it solved all of our
>>>>> problems with typically 1500-2000 concurrent jobs running.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Michel
>>>>>
>>>>> --On mercredi 10 décembre 2008 07:01 +0100 Yves Kemp <[log in to unmask]
>>>>>> wrote:
>>>>>
>>>>>> Hi Ronald,
>>>>>>
>>>>>> thanks for the hint!
>>>>>> Unfortunately, it did not help: The files are recreated after a
>>>>>> couple
>>>>>> of minutes, and have only half the size, but the problem still
>>>>>> persists
>>>>>> as before, even if waiting for a longer time.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Yves
>>>>>>
>>>>>> On 09.12.2008, at 21:20, Ronald Starink wrote:
>>>>>>
>>>>>>> Hi Yves,
>>>>>>>
>>>>>>> At Nikhef we also see this from time to time. It is caused by an
>>>>>>> internal table for Maui getting full. Our workaround is the
>>>>>>> following:
>>>>>>>
>>>>>>> service maui stop
>>>>>>> rm /var/spool/maui/maui.ck*
>>>>>>> service maui start
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Ronald
>>>>>>>
>>>>>>>
>>>>>>> Yves Kemp wrote:
>>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> we currently have a problem with Maui on one of our batch servers:
>>>>>>>> diagnose -f does not completely report the stanza below
>>>>>>>> GROUP
>>>>>>>> (5 unix groups are missing, although defined in the pbs server and
>>>>>>>> maui.cfg, and using resources)
>>>>>>>> QOS
>>>>>>>> No entry at all (even "QOS" is missing)
>>>>>>>> CLASS
>>>>>>>> The same as for QOS
>>>>>>>>
>>>>>>>> The problem appeared roughly one week ago. Two events might be
>>>>>>>> correlated: We introduced new hardware shortly before, the system
>>>>>>>> was
>>>>>>>> under very heavy load (~5000 jobs in queue and running), and we
>>>>>>>> configured a second CE for some time, on a testing basis (now
>>>>>>>> removed,
>>>>>>>> PBS configuration reverted back).
>>>>>>>>
>>>>>>>> We run one CE (grid-ce3.desy.de) in front of this batch server.
>>>>>>>> Some
>>>>>>>> information relevant to the batch server:
>>>>>>>> glite-apel-pbs-2.0.5-2.noarch
>>>>>>>> maui-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>> maui-client-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>> maui-server-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>> torque-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>> torque-client-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>> torque-mom-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>> torque-server-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>> root@grid-batch3: [~] uname -a
>>>>>>>> Linux grid-batch3.desy.de 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5
>>>>>>>> 12:59:28
>>>>>>>> CDT 2008 i686 i686 i386 GNU/Linux
>>>>>>>> root@grid-batch3: [~] cat /etc/issue
>>>>>>>> Scientific Linux SL release 4.4 (Beryllium)
>>>>>>>> Kernel \r on an \m
>>>>>>>>
>>>>>>>> An example output can be found here:
>>>>>>>> http://www.desy.de/~kemp/diagnose.txt
>>>>>>>>
>>>>>>>> I have put the config files here:
>>>>>>>> http://www.desy.de/~kemp/pbs_server.conf
>>>>>>>> http://www.desy.de/~kemp/maui.cfg
>>>>>>>> The information about the fairshare seems to be there, as shown
>>>>>>>> e.g. in
>>>>>>>> the file /var/spool/maui/stats/FS.1228780800
>>>>>>>> http://www.desy.de/~kemp/FS.1228780800
>>>>>>>> so we assume that scheduling is not affected (but we do not really
>>>>>>>> know...).
>>>>>>>>
>>>>>>>>
>>>>>>>> Does anyone have an idea what is going wrong?
>>>>>>>>
>>>>>>>> Thanks for any hint!
>>>>>>>>
>>>>>>>> Best
>>>>>>>>
>>>>>>>> Yves
>>>>>>>>
>>>>>>>> # Yves Kemp: [log in to unmask]
>>>>>>>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>>>>>>>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>>>>>>
>>>>>> # Yves Kemp: [log in to unmask]
>>>>>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>>>>>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>>>>>
>>>>>
>>>>>
>>>>> *************************************************************
>>>>> * Michel Jouvin Email : [log in to unmask] *
>>>>> * LAL / CNRS Tel : +33 1 64468932 *
>>>>> * B.P. 34 Fax : +33 1 69079404 *
>>>>> * 91898 Orsay Cedex *
>>>>> * France *
>>>>> *************************************************************
>>>
>>>
>>>
>>> *************************************************************
>>> * Michel Jouvin Email : [log in to unmask] *
>>> * LAL / CNRS Tel : +33 1 64468932 *
>>> * B.P. 34 Fax : +33 1 69079404 *
>>> * 91898 Orsay Cedex *
>>> * France *
>>> *************************************************************
>>>
>>
>>
>>
>> *************************************************************
>> * Michel Jouvin Email : [log in to unmask] *
>> * LAL / CNRS Tel : +33 1 64468932 *
>> * B.P. 34 Fax : +33 1 69079404 *
>> * 91898 Orsay Cedex *
>> * France *
>> *************************************************************
>>
>
>
>
> --
> Steve Traylen
>
----
Andreas Gellrich <[log in to unmask]>
DESY IT / Grid Computing
http://www.desy.de/~gellrich
|