Hi John
ipcs -s | grep dpmmgr | wc -l
99
ipcs -s | grep root | wc -l
32
Kashif
>>>>-----Original Message-----
>>>>From: John Bland [mailto:[log in to unmask]]
>>>>Sent: 20 June 2016 16:31
>>>>To: Kashif Mohammad
>>>>Cc: [log in to unmask]
>>>>Subject: Re: httpd on dpm pool nodes
>>>>
>>>>
>>>>
>>>>Hi,
>>>>
>>>>Checked all our pool nodes, 4-6 semaphores being used (between root
>>>>and dpmmgr).
>>>>
>>>>This is on SL6, YAIM and fresh Puppet configs.
>>>>
>>>>When you run ipcs -s which user is holding all the semaphores?
>>>>
>>>>John
>>>>
>>>>On 20/06/16 16:20, Kashif Mohammad wrote:
>>>>> Hi Everyone
>>>>>
>>>>> I have been diagnosing httpd issue on pool node. The main problem was
>>>>that httpd kept crashing immediately after the start. I kind of found the
>>>>reason but surprised that why no other site has been hit by it. Running
>>>>strace with -f option gave some clue
>>>>>
>>>>> semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left
>>>>on
>>>>> device) write(2, "Configuration Failed\n", 21) = 21
>>>>>
>>>>> further googling pointed out to this page
>>>>>
>>>>> http://www.liquidweb.com/kb/apache-error-semget-no-space-left-on-
>>>>devic
>>>>> e/
>>>>>
>>>>> Apparently server has run out of semaphores. I removed all semaphores
>>>>on a server which was offline and restarted httpd and it worked!
>>>>>
>>>>> The recommended permanent solution is to increase semaphore limit
>>>>and
>>>>> extend apache uptime, which I haven’t tested. I am not sure about the
>>>>> recommended limit in this case
>>>>>
>>>>> It seems that either my pool nodes have some issue which is preventing
>>>>> release of semaphores or everyone else have already this setting. I
>>>>> can see my kernel setting like
>>>>>
>>>>> ipcs -l
>>>>>
>>>>> ------ Shared Memory Limits --------
>>>>> max number of segments = 4096
>>>>> max seg size (kbytes) = 67108864
>>>>> max total shared memory (kbytes) = 17179869184 min seg size (bytes) =
>>>>> 1
>>>>>
>>>>> ------ Semaphore Limits --------
>>>>> max number of arrays = 128
>>>>> max semaphores per array = 250
>>>>> max semaphores system wide = 32000
>>>>> max ops per semop call = 32
>>>>> semaphore max value = 32767
>>>>>
>>>>> ------ Messages: Limits --------
>>>>> max queues system wide = 32768
>>>>> max size of message (bytes) = 65536
>>>>> default max size of queue (bytes) = 65536
>>>>>
>>>>>
>>>>> Can someone check number of semaphore on one of their pool node? I
>>>>> checked on one of my online pool node
>>>>>
>>>>> ipcs -s | wc -l
>>>>> 131
>>>>>
>>>>> I am trying to figure out that why it is only us who has this problem.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Kashif
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: GRIDPP2: Deployment and support of SRM and local storage
>>>>>>>>> management [mailto:[log in to unmask]] On
>>>>Behalf Of
>>>>>>>>> John Hill
>>>>>>>>> Sent: 10 June 2016 17:14
>>>>>>>>> To: [log in to unmask]
>>>>>>>>> Subject: Re: httpd on dpm pool nodes
>>>>>>>>>
>>>>>>>>> Hi Kashif,
>>>>>>>>> I don't know how important it is, but according to the
>>>>>>>>> documentation, you should have lcgdm-dmlite >= 0.4.1 for dpm
>>>>>>>>> 1.8.10. The latest version of that puppet module is 0.4.2 (needed for
>>>>dpm 1.8.11).
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>> On 10/06/2016 17:06, Kashif Mohammad wrote:
>>>>>>>>>> Hi Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I changed NSSecureRedirect to Off but it didn’t make any
>>>>difference.
>>>>>>>>>> So probably that is not an issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am running
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> dpm-1.8.10-1.el6.x86_64
>>>>>>>>>>
>>>>>>>>>> dpm-rfio-server-1.8.10-1.el6.x86_64
>>>>>>>>>>
>>>>>>>>>> dpm-xrootd-3.6.0-1.el6.x86_64
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and these puppet modules
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> puppet module list
>>>>>>>>>>
>>>>>>>>>> /etc/puppet/modules
>>>>>>>>>>
>>>>>>>>>> ├──CERNOps-bdii (v0.1.0)
>>>>>>>>>>
>>>>>>>>>> ├──CERNOps-fetchcrl (v1.0.0)
>>>>>>>>>>
>>>>>>>>>> ├──erwbgy-limits (v0.3.1)
>>>>>>>>>>
>>>>>>>>>> ├──lcgdm-dmlite (v0.4.0)
>>>>>>>>>>
>>>>>>>>>> ├──lcgdm-gridftp (v0.2.0)
>>>>>>>>>>
>>>>>>>>>> ├──lcgdm-lcgdm (v0.3.0)
>>>>>>>>>>
>>>>>>>>>> ├──lcgdm-voms (v0.3.0)
>>>>>>>>>>
>>>>>>>>>> ├──lcgdm-xrootd (v0.2.0)
>>>>>>>>>>
>>>>>>>>>> ├──nanliu-staging (v1.0.3)
>>>>>>>>>>
>>>>>>>>>> ├──puppetlabs-firewall (v1.8.0)
>>>>>>>>>>
>>>>>>>>>> ├──puppetlabs-mysql (v3.6.2)
>>>>>>>>>>
>>>>>>>>>> ├──puppetlabs-stdlib (v4.11.0)
>>>>>>>>>>
>>>>>>>>>> └── saz-memcached (v2.8.1)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kashif
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]]
>>>>>>>>>> *Sent:* 10 June 2016 16:46
>>>>>>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hm,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So: you have NSSecureRedirect set On (and I have it Off), but
>>>>>>>>>> that shouldn't break things in the way you're seeing.
>>>>>>>>>>
>>>>>>>>>> I also have
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> SSLCARevocationPath/etc/grid-security/certificates
>>>>>>>>>>
>>>>>>>>>> near the bottom of my file (after SSLCACertificatePath )
>>>>>>>>>>
>>>>>>>>>> Other than that, I can't see that much difference.
>>>>>>>>>>
>>>>>>>>>> What releases are your packages?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 10, 2016 at 4:16 PM Kashif Mohammad
>>>>>>>>>> <[log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Attaching conf.d directory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kashif
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>]
>>>>>>>>>> *Sent:* 10 June 2016 16:05
>>>>>>>>>> *To:* Kashif Mohammad; [log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>
>>>>>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hm, so we have puppet-configured http here, and definitely
>>>>don't see
>>>>>>>>>> this. (It looks like mod_gridsite is exceptionally confused when
>>>>>>>>>> being configured, but I can't see why...)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you send a copy of your http.conf.d directory to me as a tar
>>>>or
>>>>>>>>>> something, for comparison?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 10, 2016 at 3:54 PM Kashif Mohammad
>>>>>>>>>> <[log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John
>>>>>>>>>>
>>>>>>>>>> I have an empty ssl.conf file so I think rogue ssl is not a
>>>>>>>>>> problem here.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Kashif
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >>>>-----Original Message-----
>>>>>>>>>> >>>>From: GRIDPP2: Deployment and support of SRM and
>>>>>>>>>> local
>>>>>>>>> storage
>>>>>>>>>> >>>>management [mailto:GRIDPP-
>>>>[log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>] On Behalf Of
>>>>>>>>>> >>>>John Hill
>>>>>>>>>> >>>>Sent: 10 June 2016 15:50
>>>>>>>>>> >>>>To: [log in to unmask]
>>>>>>>>>> <mailto:[log in to unmask]>
>>>>>>>>>> >>>>Subject: Re: httpd on dpm pool nodes
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>Hi Kashif,
>>>>>>>>>> >>>> I wonder if this is the problem with the rogue ssl.conf
>>>>>>>>>> which appears
>>>>>>>>>> >>>>after an update of the httpd rpm? Check in
>>>>/etc/httpd/conf.d
>>>>>>>>>> - ssl.conf
>>>>>>>>>> >>>>should not be there, or should be an empty file.
>>>>>>>>>> >>>> If it is this problem, my solution is to create an empty
>>>>>>>>>> file - this prevents
>>>>>>>>>> >>>>the rpm update from creating a new ssl.conf. I believe
>>>>>>>>>> others delete the
>>>>>>>>>> >>>>file via puppet.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>John
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>On 10/06/2016 15:39, Kashif Mohammad wrote:
>>>>>>>>>> >>>>> Hi
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Httpd on all dpm pool nodes had died mysteriously and
>>>>I
>>>>>>>>>> suspect that it is
>>>>>>>>>> >>>>not working for quite long time. I can start it manually
>>>>>>>>>> but it dies
>>>>>>>>>> >>>>immediately.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> /var/log/httpd/error_log has this
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Fri Jun 10 15:33:52 2016] [notice] suEXEC mechanism
>>>>>>>>>> enabled (wrapper:
>>>>>>>>>> >>>>> /usr/sbin/suexec) [Fri Jun 10 15:33:52 2016] [warn] Init:
>>>>>>>>>> Session
>>>>>>>>>> >>>>> Cache is not configured [hint: SSLSessionCache] [Fri Jun
>>>>>>>>>> 10 15:33:52
>>>>>>>>>> >>>>> 2016] [notice] (os 0x5ce407b0)Unrecognized resolver
>>>>error:
>>>>>>>>>> >>>>mod_gridsite: mod_ssl_with_insecure_reneg = 1 [Fri Jun
>>>>10
>>>>>>>>>> 15:33:52 2016]
>>>>>>>>>> >>>>[notice] Digest: generating secret for digest
>>>>authentication ...
>>>>>>>>>> >>>>> [Fri Jun 10 15:33:52 2016] [notice] Digest: done
>>>>>>>>>> Configuration Failed
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Any Idea? DPM pool nodes are configured through
>>>>puppet
>>>>>>>>>> modules.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Thanks
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Kashif
>>>>>>>>>> >>>>>
>>>>>>>>>>
>>>>
>>>>
>>>>--
>>>>John Bland [log in to unmask]
>>>>System Administrator office: 220
>>>>High Energy Physics Division tel (int): 42911
>>>>Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
>>>>University of Liverpool http://www.liv.ac.uk/physics/hep/
>>>>"I canna change the laws of physics, Captain!"
|