Hi,
Checked all our pool nodes, 4-6 semaphores being used (between root and
dpmmgr).
This is on SL6, YAIM and fresh Puppet configs.
When you run ipcs -s which user is holding all the semaphores?
John
On 20/06/16 16:20, Kashif Mohammad wrote:
> Hi Everyone
>
> I have been diagnosing httpd issue on pool node. The main problem was that httpd kept crashing immediately after the start. I kind of found the reason but surprised that why no other site has been hit by it. Running strace with -f option gave some clue
>
> semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left on device)
> write(2, "Configuration Failed\n", 21) = 21
>
> further googling pointed out to this page
>
> http://www.liquidweb.com/kb/apache-error-semget-no-space-left-on-device/
>
> Apparently server has run out of semaphores. I removed all semaphores on a server which was offline and restarted httpd and it worked!
>
> The recommended permanent solution is to increase semaphore limit and extend apache uptime, which I haven’t tested. I am not sure about the recommended limit in this case
>
> It seems that either my pool nodes have some issue which is preventing release of semaphores or everyone else have already this setting. I can see my kernel setting like
>
> ipcs -l
>
> ------ Shared Memory Limits --------
> max number of segments = 4096
> max seg size (kbytes) = 67108864
> max total shared memory (kbytes) = 17179869184
> min seg size (bytes) = 1
>
> ------ Semaphore Limits --------
> max number of arrays = 128
> max semaphores per array = 250
> max semaphores system wide = 32000
> max ops per semop call = 32
> semaphore max value = 32767
>
> ------ Messages: Limits --------
> max queues system wide = 32768
> max size of message (bytes) = 65536
> default max size of queue (bytes) = 65536
>
>
> Can someone check number of semaphore on one of their pool node? I checked on one of my online pool node
>
> ipcs -s | wc -l
> 131
>
> I am trying to figure out that why it is only us who has this problem.
>
> Thanks
>
> Kashif
>
>
>
>
>>>>> -----Original Message-----
>>>>> From: GRIDPP2: Deployment and support of SRM and local storage
>>>>> management [mailto:[log in to unmask]] On Behalf Of
>>>>> John Hill
>>>>> Sent: 10 June 2016 17:14
>>>>> To: [log in to unmask]
>>>>> Subject: Re: httpd on dpm pool nodes
>>>>>
>>>>> Hi Kashif,
>>>>> I don't know how important it is, but according to the documentation,
>>>>> you should have lcgdm-dmlite >= 0.4.1 for dpm 1.8.10. The latest version of
>>>>> that puppet module is 0.4.2 (needed for dpm 1.8.11).
>>>>>
>>>>> John
>>>>>
>>>>> On 10/06/2016 17:06, Kashif Mohammad wrote:
>>>>>> Hi Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> I changed NSSecureRedirect to Off but it didn’t make any difference.
>>>>>> So probably that is not an issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am running
>>>>>>
>>>>>>
>>>>>>
>>>>>> dpm-1.8.10-1.el6.x86_64
>>>>>>
>>>>>> dpm-rfio-server-1.8.10-1.el6.x86_64
>>>>>>
>>>>>> dpm-xrootd-3.6.0-1.el6.x86_64
>>>>>>
>>>>>>
>>>>>>
>>>>>> and these puppet modules
>>>>>>
>>>>>>
>>>>>>
>>>>>> puppet module list
>>>>>>
>>>>>> /etc/puppet/modules
>>>>>>
>>>>>> ├──CERNOps-bdii (v0.1.0)
>>>>>>
>>>>>> ├──CERNOps-fetchcrl (v1.0.0)
>>>>>>
>>>>>> ├──erwbgy-limits (v0.3.1)
>>>>>>
>>>>>> ├──lcgdm-dmlite (v0.4.0)
>>>>>>
>>>>>> ├──lcgdm-gridftp (v0.2.0)
>>>>>>
>>>>>> ├──lcgdm-lcgdm (v0.3.0)
>>>>>>
>>>>>> ├──lcgdm-voms (v0.3.0)
>>>>>>
>>>>>> ├──lcgdm-xrootd (v0.2.0)
>>>>>>
>>>>>> ├──nanliu-staging (v1.0.3)
>>>>>>
>>>>>> ├──puppetlabs-firewall (v1.8.0)
>>>>>>
>>>>>> ├──puppetlabs-mysql (v3.6.2)
>>>>>>
>>>>>> ├──puppetlabs-stdlib (v4.11.0)
>>>>>>
>>>>>> └── saz-memcached (v2.8.1)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>
>>>>>> Kashif
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]]
>>>>>> *Sent:* 10 June 2016 16:46
>>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hm,
>>>>>>
>>>>>>
>>>>>>
>>>>>> So: you have NSSecureRedirect set On (and I have it Off), but that
>>>>>> shouldn't break things in the way you're seeing.
>>>>>>
>>>>>> I also have
>>>>>>
>>>>>>
>>>>>>
>>>>>> SSLCARevocationPath/etc/grid-security/certificates
>>>>>>
>>>>>> near the bottom of my file (after SSLCACertificatePath )
>>>>>>
>>>>>> Other than that, I can't see that much difference.
>>>>>>
>>>>>> What releases are your packages?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 10, 2016 at 4:16 PM Kashif Mohammad
>>>>>> <[log in to unmask]
>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>
>>>>>> Hi Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> Attaching conf.d directory
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> Kashif
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]
>>>>>> <mailto:[log in to unmask]>]
>>>>>> *Sent:* 10 June 2016 16:05
>>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>>> <mailto:[log in to unmask]>
>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hm, so we have puppet-configured http here, and definitely don't see
>>>>>> this. (It looks like mod_gridsite is exceptionally confused when
>>>>>> being configured, but I can't see why...)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you send a copy of your http.conf.d directory to me as a tar or
>>>>>> something, for comparison?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 10, 2016 at 3:54 PM Kashif Mohammad
>>>>>> <[log in to unmask]
>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>
>>>>>> Hi John
>>>>>>
>>>>>> I have an empty ssl.conf file so I think rogue ssl is not a
>>>>>> problem here.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Kashif
>>>>>>
>>>>>>
>>>>>> >>>>-----Original Message-----
>>>>>> >>>>From: GRIDPP2: Deployment and support of SRM and local
>>>>> storage
>>>>>> >>>>management [mailto:[log in to unmask]
>>>>>> <mailto:[log in to unmask]>] On Behalf Of
>>>>>> >>>>John Hill
>>>>>> >>>>Sent: 10 June 2016 15:50
>>>>>> >>>>To: [log in to unmask]
>>>>>> <mailto:[log in to unmask]>
>>>>>> >>>>Subject: Re: httpd on dpm pool nodes
>>>>>> >>>>
>>>>>> >>>>Hi Kashif,
>>>>>> >>>> I wonder if this is the problem with the rogue ssl.conf
>>>>>> which appears
>>>>>> >>>>after an update of the httpd rpm? Check in /etc/httpd/conf.d
>>>>>> - ssl.conf
>>>>>> >>>>should not be there, or should be an empty file.
>>>>>> >>>> If it is this problem, my solution is to create an empty
>>>>>> file - this prevents
>>>>>> >>>>the rpm update from creating a new ssl.conf. I believe
>>>>>> others delete the
>>>>>> >>>>file via puppet.
>>>>>> >>>>
>>>>>> >>>>John
>>>>>> >>>>
>>>>>> >>>>On 10/06/2016 15:39, Kashif Mohammad wrote:
>>>>>> >>>>> Hi
>>>>>> >>>>>
>>>>>> >>>>> Httpd on all dpm pool nodes had died mysteriously and I
>>>>>> suspect that it is
>>>>>> >>>>not working for quite long time. I can start it manually
>>>>>> but it dies
>>>>>> >>>>immediately.
>>>>>> >>>>>
>>>>>> >>>>> /var/log/httpd/error_log has this
>>>>>> >>>>>
>>>>>> >>>>> Fri Jun 10 15:33:52 2016] [notice] suEXEC mechanism
>>>>>> enabled (wrapper:
>>>>>> >>>>> /usr/sbin/suexec) [Fri Jun 10 15:33:52 2016] [warn] Init:
>>>>>> Session
>>>>>> >>>>> Cache is not configured [hint: SSLSessionCache] [Fri Jun
>>>>>> 10 15:33:52
>>>>>> >>>>> 2016] [notice] (os 0x5ce407b0)Unrecognized resolver error:
>>>>>> >>>>mod_gridsite: mod_ssl_with_insecure_reneg = 1 [Fri Jun 10
>>>>>> 15:33:52 2016]
>>>>>> >>>>[notice] Digest: generating secret for digest authentication ...
>>>>>> >>>>> [Fri Jun 10 15:33:52 2016] [notice] Digest: done
>>>>>> Configuration Failed
>>>>>> >>>>>
>>>>>> >>>>> Any Idea? DPM pool nodes are configured through puppet
>>>>>> modules.
>>>>>> >>>>>
>>>>>> >>>>> Thanks
>>>>>> >>>>>
>>>>>> >>>>> Kashif
>>>>>> >>>>>
>>>>>>
--
John Bland [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42911
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
University of Liverpool http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"
|