Van Snyder wrote:
>
> Michel Dayde and Iain Duff wrote an article in ACM Transactions on
> Mathematical Software on blocked level-3 BLAS for superscalar processors.
> Their code has parameters, that the user is expected to tune, that
> characterize the cache: How many levels, what size at each level,
> what relative speed at each level, how many objects of type <so-and-so>
> are transferred to the bottom cache per cycle.... It would seem to be
> easy to provide these numbers from intrinsic functions, at least in the
> case that the code will be run on the same machine as the compiler.
> Unfortunately, even when moving code from one platform to a superficially
> identical one, the cache characteristics can vary -- especially on PC-to-PC
> movements. It would still be better to tell folks to recompile the code
> (which uses intrinsic functions to get the cache characteristics), than
> to tell them to look up the cache characteristics, change the code, maybe
> experiment with the parameters a little bit, and recompile it.
Isn't that what Atlas does, for the BLAS anyway?
Bill
--
William F. Mitchell
Mathematical and Computational Sciences Division
National Institute of Standards and Technology
[log in to unmask] http://math.nist.gov/~mitchell
|