Hi,
I've followed Sam's advice about the tuning guide and have applied
most of the ideas there. It's helped some small amount from the DPM
head node side but the MySQL server is still under heavy load. I've
collected some slow query logs and the output is:
$ mysqldumpslow -a -s l -t 5 /var/log/mysql-slow.log # By lock time
Reading mysql slow query log from /var/log/mysql-slow.log
Count: 2143 Time=78.32s (167836s) Lock=0.02s (40s) Rows=1.0 (2143),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT FILEID, PARENT_FILEID, GUID, NAME, FILEMODE,
NLINK, OWNER_UID, GID, FILESIZE, ATIME, MTIME, CTIME,
FILECLASS, STATUS, CSUMTYPE, CSUMVALUE, ACL
FROM Cns_file_metadata WHERE parent_fileid = 0
AND name = '/'
Count: 7 Time=1.70s (11s) Lock=0.87s (6s) Rows=16.0 (112),
root[root]@localhost
SELECT CONCAT(User, '@',Host) AS User FROM mysql.user
Count: 92 Time=34.22s (3148s) Lock=0.01s (1s) Rows=1.0 (92),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT R_ORDINAL, R_TOKEN, R_UID,
R_GID, CLIENT_DN, CLIENTHOST, R_TYPE, U_TOKEN,
FLAGS, RETRYTIME, NBREQFILES, CTIME, STIME, ETIME,
STATUS, ERRSTRING, GROUPS FROM dpm_pending_req WHERE
r_token = '643c0750-8998-4d35-bc32-f09079935f60'
Count: 1 Time=7.52s (7s) Lock=0.82s (0s) Rows=0.0 (0),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT S_TOKEN, CLIENT_DN, S_UID, S_GID, RET_POLICY,
AC_LATENCY, S_TYPE, U_TOKEN, T_SPACE, G_SPACE,
U_SPACE, POOLNAME, ASSIGN_TIME, EXPIRE_TIME, GROUPS FROM
dpm_space_reserv WHERE expire_time <= 1461600329
Count: 1544 Time=55.77s (86108s) Lock=0.00s (0s) Rows=1.0 (1544),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT fileid, parent_fileid, guid, name, filemode, nlink,
owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
status, csumtype, csumvalue, acl, xattr FROM
Cns_file_metadata WHERE parent_fileid = 3 AND name =
'ph.bham.ac.uk'
or
$mysqldumpslow -a -s r -t 5 /var/log/mysql-slow.log # By rows returned
Reading mysql slow query log from /var/log/mysql-slow.log
Count: 441 Time=19297.85s (8510350s) Lock=0.00s (0s) Rows=1676767.5
(739454489), dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
select fileid,
parent_fileid,name,filesize,filemode,csumvalue,atime,mtime from
Cns_file_metadata order by parent_fileid
Count: 50 Time=965.76s (48287s) Lock=0.02s (0s) Rows=114.7 (5735),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT R_ORDINAL, R_TOKEN, R_UID,
R_GID, CLIENT_DN, CLIENTHOST, R_TYPE, U_TOKEN,
FLAGS, RETRYTIME, NBREQFILES, CTIME, STIME, ETIME,
STATUS, ERRSTRING, GROUPS FROM dpm_pending_req
Count: 72 Time=617.20s (44438s) Lock=0.00s (0s) Rows=52.1 (3750),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT fileid, parent_fileid, guid, name, filemode, nlink,
owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
status, csumtype, csumvalue, acl, xattr FROM
Cns_file_metadata WHERE parent_fileid = 8415288 ORDER
BY name ASC
Count: 2759 Time=221.25s (610418s) Lock=0.00s (0s) Rows=1.0 (2759),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT fileid, parent_fileid, guid, name, filemode, nlink,
owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
status, csumtype, csumvalue, acl, xattr FROM
Cns_file_metadata WHERE parent_fileid = 0 AND name = '/'
Count: 2143 Time=78.32s (167836s) Lock=0.02s (40s) Rows=1.0 (2143),
dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
SELECT FILEID, PARENT_FILEID, GUID, NAME, FILEMODE,
NLINK, OWNER_UID, GID, FILESIZE, ATIME, MTIME, CTIME,
FILECLASS, STATUS, CSUMTYPE, CSUMVALUE, ACL
FROM Cns_file_metadata WHERE parent_fileid = 0
AND name = '/'
I don't know how unusual these queries are but maybe you have some insight?
Cheers,
Matt
On 25 April 2016 at 13:27, Terry Froy <[log in to unmask]> wrote:
> On Mon, 2016-04-25 at 12:10 +0100, Matt Williams wrote:
>
>> Hi all,
>>
>> I'm seeing very high loads (around 12 or 13) on our DPM MySQL server.
>> This seems to be causing timeouts when DPM is requesting information
>> as I am seeing lots or errors like:
>>
>> 160425 12:05:17 [ERROR] /usr/libexec/mysqld: Lock wait timeout
>> exceeded; try restarting transaction
>> 160425 12:05:17 [ERROR] /usr/libexec/mysqld: Sort aborted
>>
>> in /var/log/mysqld.log
>>
>> Does anyone have any ideas about how I can start debugging this?
>> <snip>
>
> Hi Matt,
>
> Turn on slow query logging.
>
> ... and when you have some logs collected, if you can't figure out
> who/what/why this is happening, post them here so we can have a look!
>
> Regards,
> Terry
> --
> Terry Froy
> Cluster Systems Manager, Particle Physics
> Queen Mary, University of London
> Tel: +44 (0)207 882 6560
> E-mail: [log in to unmask]