Richard Maine wrote:
>>> Yes, the only true f95 version I was able to write....
>
> Oops. You just hit a pet peeve of mine. Sorry, but I can't let that
> stand without comment. :-)
OK, "f90++ version", but I dont think people here will accept the notation.
(i am using "goto" in f90 sometimes, if that can exuse my fault)
Tim Prince wrote:
> Is it really so fast? How does your platform avoid the same kind of
> conditional branch for each element which IF() would imply?
I am not good in IA64 or compiler architecture, sorry.
It is new to me.
> You as much as
> informed us you didn't want the exp() to be parallelized. Apparently,
> you
> expect your library to take even longer to handle underflows than
> branching
> in your own code would take.
Looks like exeptions are terrible,
this is one of the worst timings (from the very top):
Ticks Percent Cumulative Routine
Percent
--------------------------------------------------------------------
83940 43.67 43.67
orbital_moduleradial_mp_radial0_ <<<<< TOO MUCH <<<<
20989 10.92 54.59
loopm4_k8 THIS MIGHT BE BLAS
14910 7.76 62.35
EXP_CERTAIN_UNDERFLOW <<< from exp()?
8662 4.51 66.86 orbital_module_mp_orbital_calculate_
... rest skiped ...
From a different run after re-work:
...
733 0.73 79.85 orbital_moduleradial_mp_radial0_
...
235 0.23 94.57
EXP_CERTAIN_UNDERFLOW <<< there are more
places that use
exp()
There is a bit of cheating though.
The original version did quite a bit more.
I extracted that later and reformulated in BLAS.
However the one I submited stayed among the top 10.
> If you are willing to accept this level of
> discontinuity in your results, why not simply invoke abrupt underflow?
Yes, I did it by hard cutoff, if you mean that.
Alexei
|