Hi Everyone
I have been diagnosing httpd issue on pool node. The main problem was that httpd kept crashing immediately after the start. I kind of found the reason but surprised that why no other site has been hit by it. Running strace with -f option gave some clue
semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left on device)
write(2, "Configuration Failed\n", 21) = 21
further googling pointed out to this page
http://www.liquidweb.com/kb/apache-error-semget-no-space-left-on-device/
Apparently server has run out of semaphores. I removed all semaphores on a server which was offline and restarted httpd and it worked!
The recommended permanent solution is to increase semaphore limit and extend apache uptime, which I haven’t tested. I am not sure about the recommended limit in this case
It seems that either my pool nodes have some issue which is preventing release of semaphores or everyone else have already this setting. I can see my kernel setting like
ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536
Can someone check number of semaphore on one of their pool node? I checked on one of my online pool node
ipcs -s | wc -l
131
I am trying to figure out that why it is only us who has this problem.
Thanks
Kashif
>>>>-----Original Message-----
>>>>From: GRIDPP2: Deployment and support of SRM and local storage
>>>>management [mailto:[log in to unmask]] On Behalf Of
>>>>John Hill
>>>>Sent: 10 June 2016 17:14
>>>>To: [log in to unmask]
>>>>Subject: Re: httpd on dpm pool nodes
>>>>
>>>>Hi Kashif,
>>>> I don't know how important it is, but according to the documentation,
>>>>you should have lcgdm-dmlite >= 0.4.1 for dpm 1.8.10. The latest version of
>>>>that puppet module is 0.4.2 (needed for dpm 1.8.11).
>>>>
>>>>John
>>>>
>>>>On 10/06/2016 17:06, Kashif Mohammad wrote:
>>>>> Hi Sam
>>>>>
>>>>>
>>>>>
>>>>> I changed NSSecureRedirect to Off but it didn’t make any difference.
>>>>> So probably that is not an issue.
>>>>>
>>>>>
>>>>>
>>>>> I am running
>>>>>
>>>>>
>>>>>
>>>>> dpm-1.8.10-1.el6.x86_64
>>>>>
>>>>> dpm-rfio-server-1.8.10-1.el6.x86_64
>>>>>
>>>>> dpm-xrootd-3.6.0-1.el6.x86_64
>>>>>
>>>>>
>>>>>
>>>>> and these puppet modules
>>>>>
>>>>>
>>>>>
>>>>> puppet module list
>>>>>
>>>>> /etc/puppet/modules
>>>>>
>>>>> ├──CERNOps-bdii (v0.1.0)
>>>>>
>>>>> ├──CERNOps-fetchcrl (v1.0.0)
>>>>>
>>>>> ├──erwbgy-limits (v0.3.1)
>>>>>
>>>>> ├──lcgdm-dmlite (v0.4.0)
>>>>>
>>>>> ├──lcgdm-gridftp (v0.2.0)
>>>>>
>>>>> ├──lcgdm-lcgdm (v0.3.0)
>>>>>
>>>>> ├──lcgdm-voms (v0.3.0)
>>>>>
>>>>> ├──lcgdm-xrootd (v0.2.0)
>>>>>
>>>>> ├──nanliu-staging (v1.0.3)
>>>>>
>>>>> ├──puppetlabs-firewall (v1.8.0)
>>>>>
>>>>> ├──puppetlabs-mysql (v3.6.2)
>>>>>
>>>>> ├──puppetlabs-stdlib (v4.11.0)
>>>>>
>>>>> └── saz-memcached (v2.8.1)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>>
>>>>> Kashif
>>>>>
>>>>>
>>>>>
>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]]
>>>>> *Sent:* 10 June 2016 16:46
>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>
>>>>>
>>>>>
>>>>> Hm,
>>>>>
>>>>>
>>>>>
>>>>> So: you have NSSecureRedirect set On (and I have it Off), but that
>>>>> shouldn't break things in the way you're seeing.
>>>>>
>>>>> I also have
>>>>>
>>>>>
>>>>>
>>>>> SSLCARevocationPath/etc/grid-security/certificates
>>>>>
>>>>> near the bottom of my file (after SSLCACertificatePath )
>>>>>
>>>>> Other than that, I can't see that much difference.
>>>>>
>>>>> What releases are your packages?
>>>>>
>>>>>
>>>>>
>>>>> Sam
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 10, 2016 at 4:16 PM Kashif Mohammad
>>>>> <[log in to unmask]
>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>
>>>>> Hi Sam
>>>>>
>>>>>
>>>>>
>>>>> Attaching conf.d directory
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> Kashif
>>>>>
>>>>>
>>>>>
>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]
>>>>> <mailto:[log in to unmask]>]
>>>>> *Sent:* 10 June 2016 16:05
>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>> <mailto:[log in to unmask]>
>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>
>>>>>
>>>>>
>>>>> Hm, so we have puppet-configured http here, and definitely don't see
>>>>> this. (It looks like mod_gridsite is exceptionally confused when
>>>>> being configured, but I can't see why...)
>>>>>
>>>>>
>>>>>
>>>>> Can you send a copy of your http.conf.d directory to me as a tar or
>>>>> something, for comparison?
>>>>>
>>>>>
>>>>>
>>>>> Sam
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 10, 2016 at 3:54 PM Kashif Mohammad
>>>>> <[log in to unmask]
>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>
>>>>> Hi John
>>>>>
>>>>> I have an empty ssl.conf file so I think rogue ssl is not a
>>>>> problem here.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Kashif
>>>>>
>>>>>
>>>>> >>>>-----Original Message-----
>>>>> >>>>From: GRIDPP2: Deployment and support of SRM and local
>>>>storage
>>>>> >>>>management [mailto:[log in to unmask]
>>>>> <mailto:[log in to unmask]>] On Behalf Of
>>>>> >>>>John Hill
>>>>> >>>>Sent: 10 June 2016 15:50
>>>>> >>>>To: [log in to unmask]
>>>>> <mailto:[log in to unmask]>
>>>>> >>>>Subject: Re: httpd on dpm pool nodes
>>>>> >>>>
>>>>> >>>>Hi Kashif,
>>>>> >>>> I wonder if this is the problem with the rogue ssl.conf
>>>>> which appears
>>>>> >>>>after an update of the httpd rpm? Check in /etc/httpd/conf.d
>>>>> - ssl.conf
>>>>> >>>>should not be there, or should be an empty file.
>>>>> >>>> If it is this problem, my solution is to create an empty
>>>>> file - this prevents
>>>>> >>>>the rpm update from creating a new ssl.conf. I believe
>>>>> others delete the
>>>>> >>>>file via puppet.
>>>>> >>>>
>>>>> >>>>John
>>>>> >>>>
>>>>> >>>>On 10/06/2016 15:39, Kashif Mohammad wrote:
>>>>> >>>>> Hi
>>>>> >>>>>
>>>>> >>>>> Httpd on all dpm pool nodes had died mysteriously and I
>>>>> suspect that it is
>>>>> >>>>not working for quite long time. I can start it manually
>>>>> but it dies
>>>>> >>>>immediately.
>>>>> >>>>>
>>>>> >>>>> /var/log/httpd/error_log has this
>>>>> >>>>>
>>>>> >>>>> Fri Jun 10 15:33:52 2016] [notice] suEXEC mechanism
>>>>> enabled (wrapper:
>>>>> >>>>> /usr/sbin/suexec) [Fri Jun 10 15:33:52 2016] [warn] Init:
>>>>> Session
>>>>> >>>>> Cache is not configured [hint: SSLSessionCache] [Fri Jun
>>>>> 10 15:33:52
>>>>> >>>>> 2016] [notice] (os 0x5ce407b0)Unrecognized resolver error:
>>>>> >>>>mod_gridsite: mod_ssl_with_insecure_reneg = 1 [Fri Jun 10
>>>>> 15:33:52 2016]
>>>>> >>>>[notice] Digest: generating secret for digest authentication ...
>>>>> >>>>> [Fri Jun 10 15:33:52 2016] [notice] Digest: done
>>>>> Configuration Failed
>>>>> >>>>>
>>>>> >>>>> Any Idea? DPM pool nodes are configured through puppet
>>>>> modules.
>>>>> >>>>>
>>>>> >>>>> Thanks
>>>>> >>>>>
>>>>> >>>>> Kashif
>>>>> >>>>>
>>>>>
|