Print

Print


Hi Raul,

maybe because it is an older version, I don't have dpmcopyd.

I have on the head node (Centos 7.3)

dpm-1.9.0-1.el7.x86_64
dmlite-dpm-tester-0.8.5-1.el7.x86_64
dpm-rfio-server-1.9.0-1.el7.x86_64
dpm-libs-1.9.0-1.el7.x86_64
dpm-xrootd-3.6.3-1.el7.x86_64
dpm-server-mysql-1.9.0-1.el7.x86_64
dpm-dsi-1.9.11-1.el7.x86_64
dpm-srm-server-mysql-1.9.0-1.el7.x86_64
dpm-python-1.9.0-1.el7.x86_64
dpm-contrib-admintools-0.2.2-1.el7.x86_64
dpm-name-server-mysql-1.9.0-1.el7.x86_64

On the pool nodes I have (SL 6.9)
dpm-rfio-server-1.8.10-1.el6.x86_64
dpm-libs-1.8.10-1.el6.x86_64
emi-dpm_disk-1.8.10-1.el6.x86_64
dpm-yaim-4.2.21-1.el6.noarch
dpm-xrootd-3.5.5-1.el6.x86_64
dpm-devel-1.8.10-1.el6.x86_64
dpm-python-1.8.10-1.el6.x86_64
dpm-1.8.10-1.el6.x86_64
dpm-dsi-1.9.5-13.el6.x86_64
dpm-perl-1.8.10-1.el6.x86_64

As Elena says some functionality is working but there are some things that are not.


Any other ideas?


From: Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of RAUL H C LOPES <[log in to unmask]>
Sent: 31 December 2018 13:18
To: [log in to unmask]
Subject: Re: HC failures, DPM problem?
 
dpmcopyd?

I've seen disk server failures living DPM in an inconsistent state, that seems to lead to:
 - high load in the DB;
 - dpmcopyd and dpnsdaemon tending to die.

You may have to restart and, depending on the version of DPM, you may have to check for stale lock files.

I'm in version 1.10 and that still happens.

I can help you to upgrade to 1.10 in the second week of the year.

raul

On 31/12/2018 12:06, George, Simon wrote:

Hi Raul,


Thanks for your reply.

It is dpm-1.9.0-1 on the Centos 7.3 head node. No DOME.

These services are up: 

dpnsdaemon.service
srmv2.2.service
xrootd@dpmredir
dpm.service
dpm-gsiftp.service
bdii.service
rfiod.service
httpd.service

Cheers,

Simon




From: Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of RAUL H C LOPES <[log in to unmask]>
Sent: 31 December 2018 08:29
To: [log in to unmask]
Subject: Re: HC failures, DPM problem?
 
On 30/12/2018 16:53, George, Simon wrote:

The site was down for a few days before xmas due to a PDU problem taking down a critical network switch which put several key servers offline (various lessons being learned...) 

Since coming back up everything looks to be up from my side but HC says otherwise.

Do you have your DPM version? DOME?


Can you check for DPM services down in te headnode?


Raul



To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1



To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1




To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1



To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1