Hi,
are these RPMs the ones I asked for in my previous message?? ;-)
JT
On 10 Dec 2008, at 08:07, Michel Jouvin wrote:
> Yves,
>
> We experienced such a behaviour (it was with reservations but I
> suspect it may be the same pb). You may give a try to the last
> snapshot built by Steve Traylen (but not officially released as part
> of gLite):
>
> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/torque/2.3.0-2-2/
> http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/maui/3.2.6p20-10/
>
> We run it for a couple of month at GRIF and it solved all of our
> problems with typically 1500-2000 concurrent jobs running.
>
> Cheers,
>
> Michel
>
> --On mercredi 10 décembre 2008 07:01 +0100 Yves Kemp <[log in to unmask]
> > wrote:
>
>> Hi Ronald,
>>
>> thanks for the hint!
>> Unfortunately, it did not help: The files are recreated after a
>> couple
>> of minutes, and have only half the size, but the problem still
>> persists
>> as before, even if waiting for a longer time.
>>
>> Best
>>
>> Yves
>>
>> On 09.12.2008, at 21:20, Ronald Starink wrote:
>>
>>> Hi Yves,
>>>
>>> At Nikhef we also see this from time to time. It is caused by an
>>> internal table for Maui getting full. Our workaround is the
>>> following:
>>>
>>> service maui stop
>>> rm /var/spool/maui/maui.ck*
>>> service maui start
>>>
>>> Cheers,
>>> Ronald
>>>
>>>
>>> Yves Kemp wrote:
>>>> Dear all,
>>>>
>>>> we currently have a problem with Maui on one of our batch servers:
>>>> diagnose -f does not completely report the stanza below
>>>> GROUP
>>>> (5 unix groups are missing, although defined in the pbs server and
>>>> maui.cfg, and using resources)
>>>> QOS
>>>> No entry at all (even "QOS" is missing)
>>>> CLASS
>>>> The same as for QOS
>>>>
>>>> The problem appeared roughly one week ago. Two events might be
>>>> correlated: We introduced new hardware shortly before, the system
>>>> was
>>>> under very heavy load (~5000 jobs in queue and running), and we
>>>> configured a second CE for some time, on a testing basis (now
>>>> removed,
>>>> PBS configuration reverted back).
>>>>
>>>> We run one CE (grid-ce3.desy.de) in front of this batch server.
>>>> Some
>>>> information relevant to the batch server:
>>>> glite-apel-pbs-2.0.5-2.noarch
>>>> maui-3.2.6p20-snap.1182974819.8.slc4.i386
>>>> maui-client-3.2.6p20-snap.1182974819.8.slc4.i386
>>>> maui-server-3.2.6p20-snap.1182974819.8.slc4.i386
>>>> torque-2.3.0-snap.200801151629.2cri.slc4.i386
>>>> torque-client-2.3.0-snap.200801151629.2cri.slc4.i386
>>>> torque-mom-2.3.0-snap.200801151629.2cri.slc4.i386
>>>> torque-server-2.3.0-snap.200801151629.2cri.slc4.i386
>>>> root@grid-batch3: [~] uname -a
>>>> Linux grid-batch3.desy.de 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5
>>>> 12:59:28
>>>> CDT 2008 i686 i686 i386 GNU/Linux
>>>> root@grid-batch3: [~] cat /etc/issue
>>>> Scientific Linux SL release 4.4 (Beryllium)
>>>> Kernel \r on an \m
>>>>
>>>> An example output can be found here:
>>>> http://www.desy.de/~kemp/diagnose.txt
>>>>
>>>> I have put the config files here:
>>>> http://www.desy.de/~kemp/pbs_server.conf
>>>> http://www.desy.de/~kemp/maui.cfg
>>>> The information about the fairshare seems to be there, as shown
>>>> e.g. in
>>>> the file /var/spool/maui/stats/FS.1228780800
>>>> http://www.desy.de/~kemp/FS.1228780800
>>>> so we assume that scheduling is not affected (but we do not really
>>>> know...).
>>>>
>>>>
>>>> Does anyone have an idea what is going wrong?
>>>>
>>>> Thanks for any hint!
>>>>
>>>> Best
>>>>
>>>> Yves
>>>>
>>>> # Yves Kemp: [log in to unmask]
>>>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>>>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>>
>> # Yves Kemp: [log in to unmask]
>> # DESY IT 2b/314, Notkestr. 85, D-22607 Hamburg
>> # FON: +49-(0)-40-8998-2318, FAX: +49-(0)-40-8994-2318
>
>
>
> *************************************************************
> * Michel Jouvin Email : [log in to unmask] *
> * LAL / CNRS Tel : +33 1 64468932 *
> * B.P. 34 Fax : +33 1 69079404 *
> * 91898 Orsay Cedex *
> * France *
> *************************************************************
|