Hi Steve,
I installed it and so far it runs fine. The diagnose -f output is back
...
Thanx
Andreas
On Wed, 10 Dec 2008, Steve Traylen wrote:
> On Wed, Dec 10, 2008 at 10:45 AM, Andreas Gellrich
> <[log in to unmask]> wrote:
>> Hi Stev,
>> may we try the new MAUI rpms. Where are they?
> Andreas,
>
> Follow the names in the savannah patch and grab them from the link.
>
> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui
>
> current versions in patch are Torque2.3.5-1 and
> Maui=3.2.6p21-2
>
> Note these have massively had very little testing , exactly
> why they are in certification. That said if you do try
> them then I'm happy to have feedback.
> Steve
>
>
>
>
>
>
>>
>> Thanx
>> Andreas
>>
>> On Wed, 10 Dec 2008, Steve Traylen wrote:
>>
>>> On Wed, Dec 10, 2008 at 9:29 AM, Michel Jouvin <[log in to unmask]>
>>> wrote:
>>>>
>>>> Sorry, I missed the message just before the one I answered. The RPMs
>>>> built
>>>> by Steve have many limits increased compared to default ones and is
>>>> normally
>>>> suitable for large/very large configurations.
>>>
>>> The increased limits are described here.
>>>
>>> https://savannah.cern.ch/bugs/?33484
>>>
>>> and are in this patch going through certification.
>>>
>>> https://savannah.cern.ch/patch/?2517
>>>
>>> Steve
>>>>
>>>> Michel
>>>>
>>>> --On mercredi 10 décembre 2008 09:27 +0100 Michel Jouvin
>>>> <[log in to unmask]> wrote:
>>>>
>>>>> Jeff,
>>>>>
>>>>> I had no time to read the whole thread but just check ! AFAIK, this is
>>>>> (this was?) the last snapshot released by ClusterResources but Steve may
>>>>> answer more precisely.
>>>>>
>>>>> Michel
>>>>>
>>>>> --On mercredi 10 décembre 2008 09:16 +0100 Jeff Templon
>>>>> <[log in to unmask]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> are these RPMs the ones I asked for in my previous message?? ;-)
>>>>>>
>>>>>> JT
>>>>>>
>>>>>> On 10 Dec 2008, at 08:07, Michel Jouvin wrote:
>>>>>>
>>>>>>> Yves,
>>>>>>>
>>>>>>> We experienced such a behaviour (it was with reservations but I
>>>>>>> suspect it may be the same pb). You may give a try to the last
>>>>>>> snapshot built by Steve Traylen (but not officially released as part
>>>>>>> of gLite):
>>>>>>>
>>>>>>>
>>>>>>> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/torque/2.3.
>>>>>>> 0-2-2/
>>>>>>>
>>>>>>> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/maui/3.2.6p
>>>>>>> 20-10/
>>>>>>>
>>>>>>> We run it for a couple of month at GRIF and it solved all of our
>>>>>>> problems with typically 1500-2000 concurrent jobs running.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Michel
>>>>>>>
>>>>>>> --On mercredi 10 décembre 2008 07:01 +0100 Yves Kemp
>>>>>>> <[log in to unmask]
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ronald,
>>>>>>>>
>>>>>>>> thanks for the hint!
>>>>>>>> Unfortunately, it did not help: The files are recreated after a
>>>>>>>> couple
>>>>>>>> of minutes, and have only half the size, but the problem still
>>>>>>>> persists
>>>>>>>> as before, even if waiting for a longer time.
>>>>>>>>
>>>>>>>> Best
>>>>>>>>
>>>>>>>> Yves
>>>>>>>>
>>>>>>>> On 09.12.2008, at 21:20, Ronald Starink wrote:
>>>>>>>>
>>>>>>>>> Hi Yves,
>>>>>>>>>
>>>>>>>>> At Nikhef we also see this from time to time. It is caused by an
>>>>>>>>> internal table for Maui getting full. Our workaround is the
>>>>>>>>> following:
>>>>>>>>>
>>>>>>>>> service maui stop
>>>>>>>>> rm /var/spool/maui/maui.ck*
>>>>>>>>> service maui start
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Ronald
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yves Kemp wrote:
>>>>>>>>>>
>>>>>>>>>> Dear all,
>>>>>>>>>>
>>>>>>>>>> we currently have a problem with Maui on one of our batch servers:
>>>>>>>>>> diagnose -f does not completely report the stanza below
>>>>>>>>>> GROUP
>>>>>>>>>> (5 unix groups are missing, although defined in the pbs server and
>>>>>>>>>> maui.cfg, and using resources)
>>>>>>>>>> QOS
>>>>>>>>>> No entry at all (even "QOS" is missing)
>>>>>>>>>> CLASS
>>>>>>>>>> The same as for QOS
>>>>>>>>>>
>>>>>>>>>> The problem appeared roughly one week ago. Two events might be
>>>>>>>>>> correlated: We introduced new hardware shortly before, the system
>>>>>>>>>> was
>>>>>>>>>> under very heavy load (~5000 jobs in queue and running), and we
>>>>>>>>>> configured a second CE for some time, on a testing basis (now
>>>>>>>>>> removed,
>>>>>>>>>> PBS configuration reverted back).
>>>>>>>>>>
>>>>>>>>>> We run one CE (grid-ce3.desy.de) in front of this batch server.
>>>>>>>>>> Some
>>>>>>>>>> information relevant to the batch server:
>>>>>>>>>> glite-apel-pbs-2.0.5-2.noarch
>>>>>>>>>> maui-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>>>> maui-client-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>>>> maui-server-3.2.6p20-snap.1182974819.8.slc4.i386
>>>>>>>>>> torque-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>>>> torque-client-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>>>> torque-mom-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>>>> torque-server-2.3.0-snap.200801151629.2cri.slc4.i386
>>>>>>>>>> root@grid-batch3: [~] uname -a
>>>>>>>>>> Linux grid-batch3.desy.de 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5
>>>>>>>>>> 12:59:28
>>>>>>>>>> CDT 2008 i686 i686 i386 GNU/Linux
>>>>>>>>>> root@grid-batch3: [~] cat /etc/issue
>>>>>>>>>> Scientific Linux SL release 4.4 (Beryllium)
>>>>>>>>>> Kernel \r on an \m
>>>>>>>>>>
>>>>>>>>>> An example output can be found here:
>>>>>>>>>> http://www.desy.de/~kemp/diagnose.txt
>>>>>>>>>>
>>>>>>>>>> I have put the config files here:
>>>>>>>>>> http://www.desy.de/~kemp/pbs_server.conf
>>>>>>>>>> http://www.desy.de/~kemp/maui.cfg
>>>>>>>>>> The information about the fairshare seems to be there, as shown
>>>>>>>>>> e.g. in
>>>>>>>>>> the file /var/spool/maui/stats/FS.1228780800
>>>>>>>>>> http://www.desy.de/~kemp/FS.1228780800
>>>>>>>>>> so we assume that scheduling is not affected (but we do not really
>>>>>>>>>> know...).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does anyone have an idea what is going wrong?
>>>>>>>>>>
>>>>>>>>>> Thanks for any hint!
>>>>>>>>>>
>>>>>>>>>> Best
>>>>>>>>>>
>>>>>>>>>> Yves
>>>>>>>>>>
>>>>>>>>>> # Yves Kemp: [log in to unmask]
>>>>>>>>>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>>>>>>>>>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>>>>>>>>
>>>>>>>> # Yves Kemp: [log in to unmask]
>>>>>>>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>>>>>>>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *************************************************************
>>>>>>> * Michel Jouvin Email : [log in to unmask] *
>>>>>>> * LAL / CNRS Tel : +33 1 64468932 *
>>>>>>> * B.P. 34 Fax : +33 1 69079404 *
>>>>>>> * 91898 Orsay Cedex *
>>>>>>> * France *
>>>>>>> *************************************************************
>>>>>
>>>>>
>>>>>
>>>>> *************************************************************
>>>>> * Michel Jouvin Email : [log in to unmask] *
>>>>> * LAL / CNRS Tel : +33 1 64468932 *
>>>>> * B.P. 34 Fax : +33 1 69079404 *
>>>>> * 91898 Orsay Cedex *
>>>>> * France *
>>>>> *************************************************************
>>>>>
>>>>
>>>>
>>>>
>>>> *************************************************************
>>>> * Michel Jouvin Email : [log in to unmask] *
>>>> * LAL / CNRS Tel : +33 1 64468932 *
>>>> * B.P. 34 Fax : +33 1 69079404 *
>>>> * 91898 Orsay Cedex *
>>>> * France *
>>>> *************************************************************
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Traylen
>>>
>>
>> ----
>> Andreas Gellrich <[log in to unmask]>
>> DESY IT / Grid Computing
>> http://www.desy.de/~gellrich
>
>
>
> --
> Steve Traylen
>
----
Andreas Gellrich <[log in to unmask]>
DESY IT / Grid Computing
http://www.desy.de/~gellrich
|