Dear DPM site admins,
I would like to bring two issues to your attention.
The first is related to reports concerning DPM daemons segfaulting,
the second deals with the gap between the functionality in the gLite 3.2
and EMI releases and how we suggest to bridge it.
markus
DPM Daemon Segfaults
-----------------------------------
Recently some DPM sites reported DPM daemons segfaulting. The reason
for this is understood and affects all DPMs with versions <= 1.8.2.
This means that the problem has been there for a long time, but has not
affected many sites. We have documented the symptoms and a workaround.
In addition we will shortly release a patch for the gLite 3.2 DPM version.
Symptoms and Workarounds:
-----------------------------------------
The problems are linked to bug that affects the reporting of errors.
1) Segfault on temporarily unavailable disk server
Disable the affected filesystems (directly in the database if needed)
and clean up any pending requests for that set of filesystems.
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Maintenance#Thedpmdaemonkeepscrashingandihaveonenodeactiveindbbutinaccessible
2) Segfault with no disk server unavailable or inaccessible (by network glitch)
Proceed to clean up 'invalid replicas', replicas left in the database
pointing to non-existing filesystems (due to a failed drain procedure or
some other cause).
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Maintenance#Cleaningupinvalidreplicas
Up to now these two procedures have solved all reported issues,
but do let us know if you have any additional trouble.
The upcoming functionality gap between DPM in gLite 3.2 and EMI
--------------------------------------------------------------------------------------------
To minimize the risks (and service interruptions) that come with a re-
installation, most of the DPM instances are on gLite 3.2. This is what
we recommended and there is no fundamental problem with this.
However, for EMI-1 and EMI-2 significant new functionality is on its way
to become available and if we wait until the end of the run period we
will miss out on many improvements: NFS 4.1, http, improved xrootd plugin,
federation support, DMLite, SL6 etc.
We looked at several options to bridge this gap. Back-porting
all changes to gLite 3.2 had to be excluded on basis of costs.
Here is the approach that we suggest:
----------------------------------------------------
On the following page you can find instructions for adding components
to your gLite 3.2 core installations. It uses a similar mechanism
that we have used for the material in the Beta repository. It also
provides information for admins for each product.
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Components
With this some of the new functionality can be added to gLite 3.2 DPMs.
As an option for sites who need the full set of improvements before the
end of the run and are sufficiently experienced to follow a more complex
multi step upgrade procedure, we will produce within the next 6 months
a guide to upgrade gLite 3.2 DPMs to EMI-2 without a re-installation.
Thanks to sites that have shared their experience upgrading to EMI-1
without re-installation, we will use this as a starting point.
The next big change is the move to EMI-2/SL6 based releases. For existing
installations this move should be considered after the end of the run.
For new disk server nodes we recommend to install them with the latest
EMI based release. We will ensure that disk server nodes with EMI-2 SL6
will work with older DPM head nodes.
|