No, I was using old dpm-drain. At the end of drain, it was left with many failed replication where file was replicated but not deleted.
Kashif
________________________________________
From: Sam Skipsey [[log in to unmask]]
Sent: Wednesday, June 22, 2016 3:22 PM
To: Kashif Mohammad; [log in to unmask]
Subject: Re: httpd on dpm pool nodes
Ahh, were you using the new dpm-drain (via the dmlite-shell) or the old dpm-drain? Dmlite-shell stuff does, indeed, work over http - and did have some significant bugs concerning timeouts (which I guess might be related to stale semaphores?) in the dpm-drain...
Sam
On Wed, Jun 22, 2016 at 3:06 PM Kashif Mohammad <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi Everyone
Thanks for providing settings for comparison. I found that all those semaphores owned by dpmmgr were more than a month old and associated pids were already dead. I have removed all those semaphores array and httpd is running happily. I am still not sure the reason for that . I was running dpm-drain and it was crashing quite often so that may be the reason. I will keep an eye on httpd on dpm pool nodes to see if problem comes back again.
Cheers
Kashif
________________________________________
From: GRIDPP2: Deployment and support of SRM and local storage management [[log in to unmask]<mailto:[log in to unmask]>] on behalf of Sam Skipsey [[log in to unmask]<mailto:[log in to unmask]>]
Sent: Wednesday, June 22, 2016 12:08 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: httpd on dpm pool nodes
A bit late, but for us:
ipcs -s | wc -l
gives values between 4 and 9 (and our disk servers are of various ages, and include brand new installs with dpm-puppet).
We do see some use of httpd, but I think mostly automated tests as John noted.
And our limits are basically precisely the same as Oxford's.
Sam
On Mon, Jun 20, 2016 at 4:42 PM John Bland <[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>> wrote:
Could be load related, we appear to have had next to no webdav transfers
recently outside of automated tests. Our settings are identical to yours.
John
On 20/06/16 16:34, Kashif Mohammad wrote:
> Hi John
>
> ipcs -s | grep dpmmgr | wc -l
> 99
>
> ipcs -s | grep root | wc -l
> 32
>
>
> Kashif
>>>>> -----Original Message-----
>>>>> From: John Bland [mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>]
>>>>> Sent: 20 June 2016 16:31
>>>>> To: Kashif Mohammad
>>>>> Cc: [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>> Subject: Re: httpd on dpm pool nodes
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Checked all our pool nodes, 4-6 semaphores being used (between root
>>>>> and dpmmgr).
>>>>>
>>>>> This is on SL6, YAIM and fresh Puppet configs.
>>>>>
>>>>> When you run ipcs -s which user is holding all the semaphores?
>>>>>
>>>>> John
>>>>>
>>>>> On 20/06/16 16:20, Kashif Mohammad wrote:
>>>>>> Hi Everyone
>>>>>>
>>>>>> I have been diagnosing httpd issue on pool node. The main problem was
>>>>> that httpd kept crashing immediately after the start. I kind of found the
>>>>> reason but surprised that why no other site has been hit by it. Running
>>>>> strace with -f option gave some clue
>>>>>>
>>>>>> semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left
>>>>> on
>>>>>> device) write(2, "Configuration Failed\n", 21) = 21
>>>>>>
>>>>>> further googling pointed out to this page
>>>>>>
>>>>>> http://www.liquidweb.com/kb/apache-error-semget-no-space-left-on-
>>>>> devic
>>>>>> e/
>>>>>>
>>>>>> Apparently server has run out of semaphores. I removed all semaphores
>>>>> on a server which was offline and restarted httpd and it worked!
>>>>>>
>>>>>> The recommended permanent solution is to increase semaphore limit
>>>>> and
>>>>>> extend apache uptime, which I haven’t tested. I am not sure about the
>>>>>> recommended limit in this case
>>>>>>
>>>>>> It seems that either my pool nodes have some issue which is preventing
>>>>>> release of semaphores or everyone else have already this setting. I
>>>>>> can see my kernel setting like
>>>>>>
>>>>>> ipcs -l
>>>>>>
>>>>>> ------ Shared Memory Limits --------
>>>>>> max number of segments = 4096
>>>>>> max seg size (kbytes) = 67108864
>>>>>> max total shared memory (kbytes) = 17179869184 min seg size (bytes) =
>>>>>> 1
>>>>>>
>>>>>> ------ Semaphore Limits --------
>>>>>> max number of arrays = 128
>>>>>> max semaphores per array = 250
>>>>>> max semaphores system wide = 32000
>>>>>> max ops per semop call = 32
>>>>>> semaphore max value = 32767
>>>>>>
>>>>>> ------ Messages: Limits --------
>>>>>> max queues system wide = 32768
>>>>>> max size of message (bytes) = 65536
>>>>>> default max size of queue (bytes) = 65536
>>>>>>
>>>>>>
>>>>>> Can someone check number of semaphore on one of their pool node? I
>>>>>> checked on one of my online pool node
>>>>>>
>>>>>> ipcs -s | wc -l
>>>>>> 131
>>>>>>
>>>>>> I am trying to figure out that why it is only us who has this problem.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Kashif
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: GRIDPP2: Deployment and support of SRM and local storage
>>>>>>>>>> management [mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>] On
>>>>> Behalf Of
>>>>>>>>>> John Hill
>>>>>>>>>> Sent: 10 June 2016 17:14
>>>>>>>>>> To: [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>> Subject: Re: httpd on dpm pool nodes
>>>>>>>>>>
>>>>>>>>>> Hi Kashif,
>>>>>>>>>> I don't know how important it is, but according to the
>>>>>>>>>> documentation, you should have lcgdm-dmlite >= 0.4.1 for dpm
>>>>>>>>>> 1.8.10. The latest version of that puppet module is 0.4.2 (needed for
>>>>> dpm 1.8.11).
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On 10/06/2016 17:06, Kashif Mohammad wrote:
>>>>>>>>>>> Hi Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I changed NSSecureRedirect to Off but it didn’t make any
>>>>> difference.
>>>>>>>>>>> So probably that is not an issue.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I am running
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> dpm-1.8.10-1.el6.x86_64
>>>>>>>>>>>
>>>>>>>>>>> dpm-rfio-server-1.8.10-1.el6.x86_64
>>>>>>>>>>>
>>>>>>>>>>> dpm-xrootd-3.6.0-1.el6.x86_64
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and these puppet modules
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> puppet module list
>>>>>>>>>>>
>>>>>>>>>>> /etc/puppet/modules
>>>>>>>>>>>
>>>>>>>>>>> ├──CERNOps-bdii (v0.1.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──CERNOps-fetchcrl (v1.0.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──erwbgy-limits (v0.3.1)
>>>>>>>>>>>
>>>>>>>>>>> ├──lcgdm-dmlite (v0.4.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──lcgdm-gridftp (v0.2.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──lcgdm-lcgdm (v0.3.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──lcgdm-voms (v0.3.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──lcgdm-xrootd (v0.2.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──nanliu-staging (v1.0.3)
>>>>>>>>>>>
>>>>>>>>>>> ├──puppetlabs-firewall (v1.8.0)
>>>>>>>>>>>
>>>>>>>>>>> ├──puppetlabs-mysql (v3.6.2)
>>>>>>>>>>>
>>>>>>>>>>> ├──puppetlabs-stdlib (v4.11.0)
>>>>>>>>>>>
>>>>>>>>>>> └── saz-memcached (v2.8.1)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Kashif
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>]
>>>>>>>>>>> *Sent:* 10 June 2016 16:46
>>>>>>>>>>> *To:* Kashif Mohammad; [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hm,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So: you have NSSecureRedirect set On (and I have it Off), but
>>>>>>>>>>> that shouldn't break things in the way you're seeing.
>>>>>>>>>>>
>>>>>>>>>>> I also have
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> SSLCARevocationPath/etc/grid-security/certificates
>>>>>>>>>>>
>>>>>>>>>>> near the bottom of my file (after SSLCACertificatePath )
>>>>>>>>>>>
>>>>>>>>>>> Other than that, I can't see that much difference.
>>>>>>>>>>>
>>>>>>>>>>> What releases are your packages?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 10, 2016 at 4:16 PM Kashif Mohammad
>>>>>>>>>>> <[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Attaching conf.d directory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Kashif
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:*Sam Skipsey [mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>]
>>>>>>>>>>> *Sent:* 10 June 2016 16:05
>>>>>>>>>>> *To:* Kashif Mohammad; [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>
>>>>>>>>>>> *Subject:* Re: [GRIDPP-STORAGE] httpd on dpm pool nodes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hm, so we have puppet-configured http here, and definitely
>>>>> don't see
>>>>>>>>>>> this. (It looks like mod_gridsite is exceptionally confused when
>>>>>>>>>>> being configured, but I can't see why...)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can you send a copy of your http.conf.d directory to me as a tar
>>>>> or
>>>>>>>>>>> something, for comparison?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 10, 2016 at 3:54 PM Kashif Mohammad
>>>>>>>>>>> <[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi John
>>>>>>>>>>>
>>>>>>>>>>> I have an empty ssl.conf file so I think rogue ssl is not a
>>>>>>>>>>> problem here.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Kashif
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> >>>>-----Original Message-----
>>>>>>>>>>> >>>>From: GRIDPP2: Deployment and support of SRM and
>>>>>>>>>>> local
>>>>>>>>>> storage
>>>>>>>>>>> >>>>management [mailto:GRIDPP-<mailto:GRIDPP-><mailto:GRIDPP-<mailto:GRIDPP->>
>>>>> [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>] On Behalf Of
>>>>>>>>>>> >>>>John Hill
>>>>>>>>>>> >>>>Sent: 10 June 2016 15:50
>>>>>>>>>>> >>>>To: [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>>>>>>>> <mailto:[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>
>>>>>>>>>>> >>>>Subject: Re: httpd on dpm pool nodes
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>Hi Kashif,
>>>>>>>>>>> >>>> I wonder if this is the problem with the rogue ssl.conf
>>>>>>>>>>> which appears
>>>>>>>>>>> >>>>after an update of the httpd rpm? Check in
>>>>> /etc/httpd/conf.d
>>>>>>>>>>> - ssl.conf
>>>>>>>>>>> >>>>should not be there, or should be an empty file.
>>>>>>>>>>> >>>> If it is this problem, my solution is to create an empty
>>>>>>>>>>> file - this prevents
>>>>>>>>>>> >>>>the rpm update from creating a new ssl.conf. I believe
>>>>>>>>>>> others delete the
>>>>>>>>>>> >>>>file via puppet.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>John
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>On 10/06/2016 15:39, Kashif Mohammad wrote:
>>>>>>>>>>> >>>>> Hi
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Httpd on all dpm pool nodes had died mysteriously and
>>>>> I
>>>>>>>>>>> suspect that it is
>>>>>>>>>>> >>>>not working for quite long time. I can start it manually
>>>>>>>>>>> but it dies
>>>>>>>>>>> >>>>immediately.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> /var/log/httpd/error_log has this
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Fri Jun 10 15:33:52 2016] [notice] suEXEC mechanism
>>>>>>>>>>> enabled (wrapper:
>>>>>>>>>>> >>>>> /usr/sbin/suexec) [Fri Jun 10 15:33:52 2016] [warn] Init:
>>>>>>>>>>> Session
>>>>>>>>>>> >>>>> Cache is not configured [hint: SSLSessionCache] [Fri Jun
>>>>>>>>>>> 10 15:33:52
>>>>>>>>>>> >>>>> 2016] [notice] (os 0x5ce407b0)Unrecognized resolver
>>>>> error:
>>>>>>>>>>> >>>>mod_gridsite: mod_ssl_with_insecure_reneg = 1 [Fri Jun
>>>>> 10
>>>>>>>>>>> 15:33:52 2016]
>>>>>>>>>>> >>>>[notice] Digest: generating secret for digest
>>>>> authentication ...
>>>>>>>>>>> >>>>> [Fri Jun 10 15:33:52 2016] [notice] Digest: done
>>>>>>>>>>> Configuration Failed
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Any Idea? DPM pool nodes are configured through
>>>>> puppet
>>>>>>>>>>> modules.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Thanks
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Kashif
>>>>>>>>>>> >>>>>
>>>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> John Bland [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
>>>>> System Administrator office: 220
>>>>> High Energy Physics Division tel (int): 42911
>>>>> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
>>>>> University of Liverpool http://www.liv.ac.uk/physics/hep/
>>>>> "I canna change the laws of physics, Captain!"
--
John Bland [log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>
System Administrator office: 220
High Energy Physics Division tel (int): 42911
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
University of Liverpool http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"
|