> I have a question for those versed in compiler cache optimizations. I > have a specific matrix product to compute that looks like a stencil > operation in PDE solvers. I have at least two ways of doing it, and I am > wondering which is faster in terms of Fortran compiler optimization in > memory access and usage. Since you apparently already have the code, why not just run some timing tests? After all, it doesn't matter which is better in general, just which is better for your particular application. (Forgive my last message, where I just forwarded the one I'm now replying to by mistake.) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%