> Transfer to CREAM failed due to exception: CREAM Register returned error
> "MethodName=[jobRegister] Timestamp=[Sat 12 Nov 2011 19:43:45]
> ErrorCode=[0] Description=[system error] FaultCause=[Rollback executed
> due to: Lock wait timeout exceeded; try restarting transaction]"
> I ran tunemysql.pl against the DB on those hosts and it suggested
> increasing the innodb_buffer_pool_size to 2048M or more.
> I upped it to 2048M, the errors have gone away and you can see the effect
> from the attached ganglia plot.
Which reminds me of the good advice (mostly from ScotGrid) for
the MySQL instance used for DPM:
https://www.gridpp.ac.uk/wiki/Performance_and_Tuning#Increasing_buffer_capacity
http://northgrid-tech.blogspot.com/2011/06/dpm-optimization.html
http://www.scotgrid.ac.uk/wiki/index.php/DPM_Optimisation
There is a useful note in one links about an additional
optimisation which I think might be generally applicable:
The interaction between OS and the InnoDB engine is controlled
with the innodb_flush_method option, which should be set to
O_DIRECT to disable OS-level caching.
BTW I am somewhat surprised that the CreamCE MySQL is that
loaded, after all the number of jobs recorded by a CE is usually
in the thousands, while the number of files recorded by a DPM is
enormously larger and unless one puts in the parameter to clean
old records they are recorded forever. Which makes me wonder
whether there is some analogous behaviour with CreamCES
In general the defaults in MySQL (at least the version in SL5)
are for much smaller machines than contemporary ones; another
example the sort buffer size in the MyISAM maintenance commands.
Also I think because WLCG sites have been supporting ever larger
job loads and file collections pushing pretty hard on DMBSes.
In an ideal world the gLite/UMD packagers would build packages
with defaults more suitable to contemporary T2 sites...
|