Thanks.
I'll read this and try it. In the meantime, here's our report of the
situation right now.
----------- ARC/Condor system, SL 6.4 -------------
Sometime at the end of September, our automatic update program for
security releases placed a new rpm on our ARC/Condor system.
openldap-servers-2.4.40-6.el6_7.x86_64.rpm
Since then, the BDII on that system won't stay running. We see this in
the /var/log/messages file:
hepgrid2 kernel: slapd[717]: segfault at 7f930245eb90 ip
00007f930245eb90 sp 00007f92e767c028 error 15
Suggest not to use this until I can show definitively that the previous
version works and this one fails.
--------------------------------------------------------------
Steve
On 10/06/2015 09:59 AM, Maria Alandes Pradillo wrote:
> Dear Andreas,
>
> A workaround has been tested now by Jan Astalos and we are trying to get Dennis van Dok to try it and confirm it also works.
>
> It needs to comment out the following lines in /etc/bdii/bdii-top-slapd.conf (This is for TOP BDIIs, which are the ones with bigger LDAP tree and heavier load):
> #######################################################################
> # Relay DB to address performance issues
> #######################################################################
> #database relay
> #suffix "o=grid"
> #overlay rwm
> #suffixmassage "o=grid,o=shadow"
>
>
> And change these lines in the "GLUE 1.3 database definitions":
> suffix "o=shadow"
> rootdn "o=shadow"
>
> to :
> suffix "o=grid"
> rootdn "o=grid"
>
> It seems this solves the problem. As reported by Jan, it seems hdb and relay backends were not tested properly by openldap team before releasing 2.4.40 and this doesn't seem to work properly anymore.
>
> The reason why this rely was defined was to overcome some performance issues in the past. However, it seems this is not needed anymore since we didn't implement it for GLUE 2.0 and we didn't have any performance issues once GLUE 2 LDAP tree was as big as GLUE 1 LDAP tree. So I would say it's indeed safe to remove the o=shadow relay. However, it would be good to do some tests in production top BDIIs before making a release.
>
> If more sites could give this a try and report back, it would be very useful.
>
> Regards,
> Maria
>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
>> Behalf Of Andreas Haupt
>> Sent: 06 October 2015 10:24
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] TOP BDII issues with CentOS 6.7 (openldap-servers-
>> 2.4.40-5.el6.x86_64)
>>
>> Hi Maria, all,
>>
>> the problem has become critical now, as the update isn't any longer bound to an
>> SL/CentOS upgrade to 6.7. There has been an update marked as security fix, so
>> all bdii nodes will get it - or already got it:
>>
>> [nero-vm2] /root # rpm -q --changelog openldap-servers | head -n 2
>> * Thu Sep 17 2015 Matúš Honěk <[log in to unmask]> - 2.4.40-6
>> - CVE-2015-6908 openldap: ber_get_next denial of service vulnerability
>> (#1263171)
>>
>> As slapd won't get restarted automatically on update, "bad surprises"
>> are expected to show up on next reboot, yaim run or bdii restart ...
>>
>> It just takes some seconds until the 2.4.40-based top-level bdii reliably segfaults
>> ... :-(
>>
>> Cheers,
>> Andreas
>>
>> Am Dienstag, den 29.09.2015, 09:42 +0000 schrieb Maria Alandes Pradillo:
>>> Dear Ryan,
>>>
>>> Have you finally updated to SL6.7? Please, let us know the details and whether
>> it went fine.
>>> Regards,
>>> Maria
>>>
>>>> -----Original Message-----
>>>> From: LHC Computer Grid - Rollout
>>>> [mailto:[log in to unmask]] On Behalf Of Ryan Taylor
>>>> Sent: 25 September 2015 22:48
>>>> To: [log in to unmask]
>>>> Subject: Re: [LCG-ROLLOUT] TOP BDII issues with CentOS 6.7
>>>> (openldap-servers-
>>>> 2.4.40-5.el6.x86_64)
>>>>
>>>> Hi,
>>>>
>>>> Was any further information found, or should it be okay to update
>>>> BDIIs to SL6.7 now?
>>>>
>>>> Thanks,
>>>> -rt
>>>>
>>>> Ryan Taylor
>>>> Grid & Cloud Computing Specialist
>>>> Data Centre Services, University Systems University of Victoria
>>>>
>>>> On 08/21/2015 06:28 AM, andrea wrote:
>>>>> Hi Dennis,
>>>>> this problem was also reported by our colleagues at CERN running a
>>>>> resource BDII in a ARC-CE after the upgrade to SLC 6.7 after that
>>>>> i tried to reproduce the issue in other resource BDIIs but i could not.
>>>>> I know that also Maria Allandes ( responsible for BDII) tried to
>>>>> reproduce it in both Site and Top BDIIs without finding the same
>>>>> problem
>>>>>
>>>>> For the moment i guess we can suggest not to upgrade the BDII nodes to
>> 6.7.
>>>>> Maria is going to be back to work next week, so she will try to
>>>>> investigate
>>>> more.
>>>>> thanks!
>>>>> cheers
>>>>> Andrea
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Il 21/08/15 14:15, Dennis van Dok ha scritto:
>>>>>> Hi,
>>>>>>
>>>>>> we just upgraded to CentOS 6.7 on Tuesday, and besides a bad case
>>>>>> of CVMFS failures it turns out this also breaks our top level BDII.
>>>>>>
>>>>>> The upgraded component is
>>>>>>
>>>>>> openldap-servers-2.4.40-5.el6.x86_64
>>>>>>
>>>>>> and for yet unknown reasons the service repeatedly crashes.
>>>>>>
>>>>>> /var/log/kern:Aug 21 09:13:22 ha-kraal kernel: slapd[24759]:
>>>>>> segfault at 7f31feddba90 ip
>>>>>> 00007f31feddba90 sp 00007f31763a7028 error 15 /var/log/kern:Aug
>>>>>> 21
>>>>>> 10:30:33 ha-kraal kernel: slapd[13964]: segfault at 7f78008008d0
>>>>>> ip
>>>>>> 00007f78008008d0 sp 00007f7730676028 error 15 /var/log/kern:Aug
>>>>>> 21
>>>>>> 11:26:02 ha-kraal kernel: slapd[10760]: segfault at 7f57c69e6510
>>>>>> ip
>>>>>> 00007f57c69e6510 sp 00007f573e7b3028 error 15 /var/log/kern:Aug
>>>>>> 21
>>>>>> 12:25:56 ha-kraal kernel: slapd[6799]: segfault at 7f39dd417a90
>>>>>> ip
>>>>>> 00007f39dd417a90 sp 00007f39561e6028 error 15 /var/log/kern:Aug
>>>>>> 21
>>>>>> 13:03:02 ha-kraal kernel: slapd[6469]: segfault at 7efe7551c690
>>>>>> ip
>>>>>> 00007efe7551c690 sp 00007efdb5e79028 error 15
>>>>>>
>>>>>> For the moment we've downgraded our TL BDIIs to CentOS 6.6, and
>>>>>> I'm trying to collect a crash report on our test node (basically
>>>>>> waiting for the failure to occur again) so I can inform Red Hat.
>>>>>>
>>>>>> Has anybody else experienced this? Note that our resource and
>>>>>> site BDIIs run the same software but do not seem to be affected.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Dennis
>>>>>>
>> --
>> | Andreas Haupt | E-Mail: [log in to unmask]
>> | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt
>> | Platanenallee 6 | Phone: +49/33762/7-7359
>> | D-15738 Zeuthen | Fax: +49/33762/7-7216
--
Steve Jones [log in to unmask]
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
|