> I have to apply this patch at some point, do we need any downtime for this.
It's simply stopping the services, upgrade the rpms, editing a file or
two and restarting the service (no need to even yaim). You could be able
to do it in as little as 5 minutes if you're fully prepped and caffeinated.
I'll let you decide whether or not that calls for a downtime :-) but I
would at least put your site at risk.
Cheers,
Matt
>
> On Fri, Aug 31, 2012 at 11:43 AM, John Bland <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
> On 30/08/2012 16:52, Matt Doidge wrote:
>
> Hi,
>
> So: to clarify.
>
> There is a patch for the glite3.2 1.8.2 DPM. I was hoping
> Matt would
> link to it for me ;), but it's here:
> https://svnweb.cern.ch/trac/__lcgdm/blog/glite-release-1-8-__2-5
> <https://svnweb.cern.ch/trac/lcgdm/blog/glite-release-1-8-2-5>
>
> When you upgrade to this, you need to also apply the
> mitigation I
> linked to in the earlier email. (Otherwise, your dpm and
> dpnsdaemons
> will repeatedly segfault whenever they try to fork new threads.)
>
>
> Sorry for abandoning this thread when the time came to prove my
> mettle!
> I'm currently drowning in my own sorrows!
>
> But just a quick word to say that the upgrade was very quick and
> (despite somehow forgetting to apply the thread workaround)
> painless.
>
>
> Just to note that Liverpool got bit in the ass by the segfault bug
> last night. This was caused by a pool node array going offline.
> Applying the 1.8.2-5 upgrade (and pthread fix) stopped the segfaulting.
>
> Might be worth other sites applying this temporary patch as a
> preventative measure. We ran for months with no problem before this.
>
> John
>
>
> Cheers,
> Matt
>
>
>
> Sam
>
> On 30 August 2012 15:46, Alessandra
> Forti<[log in to unmask]
> <mailto:[log in to unmask]>__>
> wrote:
>
> PS which section of the scripts whould I use? At the
> beginning of
> only in
> start?
>
>
> On 30/08/2012 15:29, Sam Skipsey wrote:
>
>
> This is the globus threading issue (it applies to
> 1.8.3 and to the
> patch to 1.8.2 for glite3.2).
>
> https://svnweb.cern.ch/trac/__lcgdm/ticket/505
> <https://svnweb.cern.ch/trac/lcgdm/ticket/505>
>
> Sam
>
> On 30 August 2012 15:23, Alessandra
> Forti<[log in to unmask]
> <mailto:[log in to unmask]>__>
> wrote:
>
>
> Thanks for your help. I'll look at the db
> entries now.
>
> Which patch and globus thread in case this isn't
> sufficient?
>
> cheers
> alessandra
>
>
>
> On 30/08/2012 15:15, Matt Doidge wrote:
>
>
> Heya,
>
>
> our DPM is segfaulting again and often
> with dpm and srmv2.2
> crashing.
> Dropping the request table and
> rebuilding the database hasn't had
> any
> beneficial- at least not for long. I
> recall it might be a problem
> with
> incomplete entries in the database and
> you had a SQL update from
> Ricardo
> to eliminate them. Is that correct?
> Could you let me know it,
> otherwise
> I'll ask Ricardo.
>
>
>
> The original problem at Lancaster was that
> we had requests being
> put in
> for replicas that not only didn't exist, but
> were on disk pools
> that no
> longer exist.
>
> These were identified using:
> Making sure that "select * from dpm_fs;"
> only shows existing hosts).
>
> select poolname, host, fs, sfn from
> cns_db.Cns_file_replica where
> host
> not in (select distinct server from
> dpm_db.dpm_fs);
>
> (get requests)
> select from_surl, pfn from
> dpm_db.dpm_get_filereq where server not in
> (select distinct server from dpm_db.dpm_fs);
>
> (put requests)
> select to_surl, pfn from
> dpm_db.dpm_put_filereq where server not in
> (select distinct server from dpm_db.dpm_fs);
>
> (pending requests)
> select to_surl, pfn from
> dpm_db.dpm_pending_req where server not in
> (select distinct server from dpm_db.dpm_fs);
>
>
> We then got brutal and (after making sure we
> had backed it up) we
> removed
> all the entries that should be from the
> database.
>
> However the last time DPM had a segfaulting
> issue the only way to
> fix it
> was to install the glite 3.2 patch, which
> wasn't too hard (once we
> remembered the globus thread tweak), and
> we've been plain sailing
> since.
>
> Cheers,
> Matt
>
> thanks
>
> cheers
> alessandra
>
>
> --
> Facts aren't facts if they come from the wrong
> people. (Paul Krugman)
>
>
>
>
> --
> Facts aren't facts if they come from the wrong people.
> (Paul Krugman)
>
>
>
> --
> John Bland [log in to unmask] <mailto:[log in to unmask]>
> System Administrator office: 220
> High Energy Physics Division tel (int): 42911
> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
> <tel:%2B44%20%280%29151%20794%202911>
> University of Liverpool http://www.liv.ac.uk/physics/__hep/
> <http://www.liv.ac.uk/physics/hep/>
> "I canna change the laws of physics, Captain!"
>
>
|