Hi,
FYI.
https://ggus.eu/ws/ticket_info.php?ticket=76571
JT
On Nov 17, 2011, at 22:54 , Ben Waugh wrote:
> This (Kashif's error below) turns out to be another symptom of the problem I encountered, and Jeff's fix works for me too:
>
> qmgr -c "s q atlas resources_default.walltime = 72:00:00"
> qmgr -c "s q atlas resources_default.cput = 48:00:00"
>
> Now "showq" reports sane "remaining time" numbers, and I hope overrunning jobs will be terminated in future.
>
> Cheers,
> Ben
>
> On 17/11/11 16:25, Jeff Templon wrote:
> > Hi,
> >
> > I think you have a problem that was discussed around august / september. Newer installations of Torque for some reason do not set the parameter
> >
> > resources_default.walltime
> >
> > and this causes that bug (qwt is not defined and hence the multiply operation fails).
> >
> > Give that parameter a value in torque, the error should go away.
> >
> > JT
> >
> > On 16 Nov 2011, at 13:45, Kashif Mohammad wrote:
> >
> >> Hi
> >>
> >> I am seeing this error /var/log/bdii/bdii-update.log in all of our CE's
> >> Traceback (most recent call last):
> >> File "/opt/lcg/libexec/lcg-info-dynamic-scheduler", line 435, in ?
> >> wrt = qwt * nwait
> >>
> >> "/opt/lcg/libexec/lcg-info-dynamic-scheduler belongs to lcg-info-dynamic-scheduler-generic-2.3.4-1 which hasn't changed for long and I am not able to correlate this problem with any other change.
> >> Rpm -qa | grep bdii
> >> bdii-5.0.8-1
> >>
> >> The end result is that CE is publishing only default dynamic values. The last update was almost two week back
> >>
> >> Nov 02 10:01:39 Updated: torque-2.5.7-2.el5.1.x86_64
> >> Nov 02 10:01:39 Updated: libtorque-2.5.7-2.el5.1.x86_64
> >> Nov 02 10:01:40 Updated: torque-client-2.5.7-2.el5.1.x86_64
> >> Nov 02 10:01:40 Updated: glite-apel-core-2.0.13-8.noarch
> >> Nov 02 10:01:40 Updated: glite-version-3.2.3-1.noarch
> >> Nov 02 10:01:40 Updated: glite-yaim-torque-utils-4.1.0-2.sl5.noarch
> >> Nov 02 10:01:40 Updated: freetype-2.2.1-28.el5_7.1.x86_64
> >> Nov 02 10:01:40 Updated: glite-TORQUE_utils-3.2.4-2.sl5.x86_64
> >> Nov 02 10:01:53 Installed: kernel-2.6.18-274.7.1.el5.x86_64
> >> Nov 02 10:01:53 Updated: rpm-libs-4.4.2.3-22.el5_7.2.x86_64
> >> Nov 02 10:01:57 Updated: rpm-4.4.2.3-22.el5_7.2.x86_64
> >> Nov 02 10:01:59 Updated: rpm-python-4.4.2.3-22.el5_7.2.x86_64
> >> Nov 02 10:02:05 Updated: torque-client-2.5.7-2.el5.1.x86_64
> >>
> >> There are chances that problem started after this update and we haven't noticed as most of the big VO's do direct submission.
> >> Any suggestion please ?
> >>
> >> Thanks
> >> Kashif
>
> On 11/11/11 12:25, Ben Waugh wrote:
>> Thanks for your reply Arnau. I have not compiled Maui myself but have
>> installed it from the glite-TORQUE_server_ext repository, and the
>> version I have is maui-3.2.6p21-snap.1234905291.5.el5, along with
>> torque-2.5.7-2.el5.1.
>>
>> Can someone who has also installed these versions from the gLite
>> repository check whether they see the same effect? I would be a little
>> surprised if this distribution did not have the appropriate
>> configuration options, but as I said, I would probably not have noticed
>> this myself if not for an unrelated problem leading to jobs using much
>> more wall- than CPU time.
>>
>> Cheers,
>> Ben
>>
>>
>>
>> On 11/10/2011 04:15 PM, Arnau Bria wrote:
>>> On Thu, 10 Nov 2011 16:03:09 +0000
>>> Ben Waugh wrote:
>>>
>>>> Hi All,
>>> Hi,
>>>
>>>> I suspect this problem might have arisen from upgrading Torque/Maui
>>>> as part of the recent gLite changes, without draining the farm.
>>> have you compiled torque by your side?
>>>
>>> I had simliar issue some time ago...
>>> take a look at:
>>> http://www.supercluster.org/pipermail/torqueusers/2010-June/010740.html
>>>
>>> solved by adding --enable-maxdefault at configure time
>>>
>>> You could chek if this is the same issue when doing a qstat -f and
>>> check if there are no default resource time limits.
>>>
>>> HTH,
>>> Arnau
>
> --
> Dr Ben Waugh Tel. +44 (0)20 7679 7223
> Dept of Physics and Astronomy Internal: 37223
> University College London
> London WC1E 6BT
|