The system is currently sitting at a load of but each of the 8 cores
is only at about ~5% utilisation. I guess this points to IO contention
which I am currently looking into.
The my.cnf looks like:
[client]
port = 3306
socket = /var/lib/mysql/mysql.sock
[isamchk]
key_buffer_size = 16M
[mysqld]
basedir = /usr
bind_address =
datadir = /var/lib/mysql
expire_logs_days = 10
key_buffer_size = 16M
log-error = /var/log/mysqld.log
max_allowed_packet = 16M
max_binlog_size = 100M
max_connections = 151
myisam_recover = BACKUP
pid-file = /var/run/mysqld/mysqld.pid
port = 3306
query_cache_limit = 1M
query_cache_size = 16M
skip-external-locking
socket = /var/lib/mysql/mysql.sock
ssl = false
ssl-ca = /etc/mysql/cacert.pem
ssl-cert = /etc/mysql/server-cert.pem
ssl-key = /etc/mysql/server-key.pem
thread_cache_size = 8
thread_stack = 256K
tmpdir = /tmp
user = mysql
[mysqld_safe]
log-error = /var/log/mysqld.log
nice = 0
socket = /var/lib/mysql/mysql.sock
[mysqldump]
max_allowed_packet = 16M
quick
quote-names
!includedir /etc/mysql/conf.d/
but most (all?) of the customisation is in
/etc/mysql/conf.d/dpm-db.cnf which is:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links=0
# increase cache size
innodb_buffer_pool_size=6000M
innodb_flush_method=O_DIRECT
innodb_file_per_table=1
log-bin
expire_logs_days=3
# from https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/TuningHints#Hint:AdjusttheMySQLparameters
max_connections = 1000
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=8
query_cache_limit=1M
query_cache_size=256M
# Slow query logging
#slow-query-log = 1
#slow-query-log-file = /var/log/mysql-slow.log
#long_query_time = 1
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
On 26 April 2016 at 11:05, Sam Skipsey
<[log in to unmask]> wrote:
> Huh, well, the last query shouldn't take that long, as there's an index on
> parent_fileid and name in Cns_file_metadata (and there's only one entry
> which has any given parent_fileid).
> This sounds like there's some other issue, possibly load-related on the
> mysql server?
> (What's your /etc/my.cnf look like?)
>
> Sam
>
> On Tue, Apr 26, 2016 at 10:54 AM Matt Williams
> <[log in to unmask]> wrote:
>>
>> Hi,
>>
>> I've followed Sam's advice about the tuning guide and have applied
>> most of the ideas there. It's helped some small amount from the DPM
>> head node side but the MySQL server is still under heavy load. I've
>> collected some slow query logs and the output is:
>>
>> $ mysqldumpslow -a -s l -t 5 /var/log/mysql-slow.log # By lock time
>>
>> Reading mysql slow query log from /var/log/mysql-slow.log
>> Count: 2143 Time=78.32s (167836s) Lock=0.02s (40s) Rows=1.0 (2143),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT FILEID, PARENT_FILEID, GUID, NAME, FILEMODE,
>> NLINK, OWNER_UID, GID, FILESIZE, ATIME, MTIME, CTIME,
>> FILECLASS, STATUS, CSUMTYPE, CSUMVALUE, ACL
>> FROM Cns_file_metadata WHERE parent_fileid = 0
>> AND name = '/'
>>
>> Count: 7 Time=1.70s (11s) Lock=0.87s (6s) Rows=16.0 (112),
>> root[root]@localhost
>> SELECT CONCAT(User, '@',Host) AS User FROM mysql.user
>>
>> Count: 92 Time=34.22s (3148s) Lock=0.01s (1s) Rows=1.0 (92),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT R_ORDINAL, R_TOKEN, R_UID,
>> R_GID, CLIENT_DN, CLIENTHOST, R_TYPE, U_TOKEN,
>> FLAGS, RETRYTIME, NBREQFILES, CTIME, STIME, ETIME,
>> STATUS, ERRSTRING, GROUPS FROM dpm_pending_req WHERE
>> r_token = '643c0750-8998-4d35-bc32-f09079935f60'
>>
>> Count: 1 Time=7.52s (7s) Lock=0.82s (0s) Rows=0.0 (0),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT S_TOKEN, CLIENT_DN, S_UID, S_GID, RET_POLICY,
>> AC_LATENCY, S_TYPE, U_TOKEN, T_SPACE, G_SPACE,
>> U_SPACE, POOLNAME, ASSIGN_TIME, EXPIRE_TIME, GROUPS FROM
>> dpm_space_reserv WHERE expire_time <= 1461600329
>>
>> Count: 1544 Time=55.77s (86108s) Lock=0.00s (0s) Rows=1.0 (1544),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT fileid, parent_fileid, guid, name, filemode, nlink,
>> owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
>> status, csumtype, csumvalue, acl, xattr FROM
>> Cns_file_metadata WHERE parent_fileid = 3 AND name =
>> 'ph.bham.ac.uk'
>>
>>
>> or
>>
>> $mysqldumpslow -a -s r -t 5 /var/log/mysql-slow.log # By rows returned
>>
>> Reading mysql slow query log from /var/log/mysql-slow.log
>> Count: 441 Time=19297.85s (8510350s) Lock=0.00s (0s) Rows=1676767.5
>> (739454489), dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> select fileid,
>> parent_fileid,name,filesize,filemode,csumvalue,atime,mtime from
>> Cns_file_metadata order by parent_fileid
>>
>> Count: 50 Time=965.76s (48287s) Lock=0.02s (0s) Rows=114.7 (5735),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT R_ORDINAL, R_TOKEN, R_UID,
>> R_GID, CLIENT_DN, CLIENTHOST, R_TYPE, U_TOKEN,
>> FLAGS, RETRYTIME, NBREQFILES, CTIME, STIME, ETIME,
>> STATUS, ERRSTRING, GROUPS FROM dpm_pending_req
>>
>> Count: 72 Time=617.20s (44438s) Lock=0.00s (0s) Rows=52.1 (3750),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT fileid, parent_fileid, guid, name, filemode, nlink,
>> owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
>> status, csumtype, csumvalue, acl, xattr FROM
>> Cns_file_metadata WHERE parent_fileid = 8415288 ORDER
>> BY name ASC
>>
>> Count: 2759 Time=221.25s (610418s) Lock=0.00s (0s) Rows=1.0 (2759),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT fileid, parent_fileid, guid, name, filemode, nlink,
>> owner_uid, gid, filesize, atime, mtime, ctime, fileclass,
>> status, csumtype, csumvalue, acl, xattr FROM
>> Cns_file_metadata WHERE parent_fileid = 0 AND name = '/'
>>
>> Count: 2143 Time=78.32s (167836s) Lock=0.02s (40s) Rows=1.0 (2143),
>> dpmmgr[dpmmgr]@epgse1.ph.bham.ac.uk
>> SELECT FILEID, PARENT_FILEID, GUID, NAME, FILEMODE,
>> NLINK, OWNER_UID, GID, FILESIZE, ATIME, MTIME, CTIME,
>> FILECLASS, STATUS, CSUMTYPE, CSUMVALUE, ACL
>> FROM Cns_file_metadata WHERE parent_fileid = 0
>> AND name = '/'
>>
>>
>> I don't know how unusual these queries are but maybe you have some
>> insight?
>>
>> Cheers,
>> Matt
>>
>>
>> On 25 April 2016 at 13:27, Terry Froy <[log in to unmask]> wrote:
>> > On Mon, 2016-04-25 at 12:10 +0100, Matt Williams wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm seeing very high loads (around 12 or 13) on our DPM MySQL server.
>> >> This seems to be causing timeouts when DPM is requesting information
>> >> as I am seeing lots or errors like:
>> >>
>> >> 160425 12:05:17 [ERROR] /usr/libexec/mysqld: Lock wait timeout
>> >> exceeded; try restarting transaction
>> >> 160425 12:05:17 [ERROR] /usr/libexec/mysqld: Sort aborted
>> >>
>> >> in /var/log/mysqld.log
>> >>
>> >> Does anyone have any ideas about how I can start debugging this?
>> >> <snip>
>> >
>> > Hi Matt,
>> >
>> > Turn on slow query logging.
>> >
>> > ... and when you have some logs collected, if you can't figure out
>> > who/what/why this is happening, post them here so we can have a look!
>> >
>> > Regards,
>> > Terry
>> > --
>> > Terry Froy
>> > Cluster Systems Manager, Particle Physics
>> > Queen Mary, University of London
>> > Tel: +44 (0)207 882 6560
>> > E-mail: [log in to unmask]
|