Dear Jacob,
intuitively, the four points give you a better fit in case they have a wider
spread: If points 1 and 2 are measured at x=1 and x=2, whereas the additional
points are measured at x=-5 and x=9, the fit with 4 points is probably more
reliable, assuming that the difference in accuracy is at a reasonable level.
This is similar to the resolution of you data. When you measure data to 1.7A,
all reflections between 1.7A and 1.2A are estimated to be 0. That's usually
much worse an estimate than whatever values you get from the measurement to
1.2A (unless, of course you measured noise).
Considering the 'information content': bzip2 usually comes very close to the
entropy based information content of your data.
Best,
Tim
On Friday, February 05, 2016 01:22:16 AM Keller, Jacob wrote:
> On Thursday, 04 February, 2016 20:09:30 Keller, Jacob wrote:
>
>
> > It seems to me that the oft-rehearsed requirement of certain
> > data:parameter
>
>
> > ratios depends highly on the precision of the measurements (nothing novel
>
>
>
> > here), so a measure of "information," rather than either a simple ratio
> > or
>
>
>
> > an empirically-based rule of thumb, might be the best guide in deciding
>
>
>
> > which parameters to model.
>
>
>
>
>
> >This is not true. The desire for a large observation:parameter ratio has
>
>
> nothing to do with the precision of the observations.
>
> Consider: a single observation is insufficient to fit a line (ax+b)
>
> no matter how precise that observation may be.
>
>
>
>
>
> I was assuming a case of over-determination, in which the number of data
> points is greater than the number of parameters. Given that, consider in
> your linear case two scenarios:
>
>
> -two data points, each with high precision and accuracy
>
>
>
> -four data points, each with lesser precision and accuracy
>
>
>
> Is there a way to quantify which case tells you more about the line?
>
>
>
> As I mentioned in my post before, it also matters what kind of function you
> are fitting—obviously data points at +/- infinity will give you the
> greatest information for determining the line, but some functions have
> other “interesting” or “high-information content” places, like near the KD
> of a binding curve. It seems to me that the Fourier transform is an
> interesting case, since every term affects every part of the electron
> density.
>
>
> Perhaps fitting the electron density is similar to the simplified case of
> fitting a circle at fixed points of increasing closeness—once one knows a
> point or two, it’s over-determined. Since all points are equally effective
> at defining the circle, it seems one could arbitrarily choose whether to
> measure fewer points better or more points worse. But what metric could be
> used to quantify how well-defined the circle was in these two cases?
>
>
> In light of this, why should it be that a dataset with 10,000 high
> precision/accuracy reflections is considered unsuitable for a 1000-atom
> anisotropic B model, whereas a 20,000 low precision/accuracy dataset is
> suitable? (Rough arbitrary numbers). Shouldn’t it go according to the
> “information” content of the dataset, given that it’s overdetermined?
>
>
> In other words: the data-to-parameter ratio as a measure seems
> oversimplified once the case is overdetermined. The question is what metric
> is appropriate?
>
>
> JPK
>
>
>
>
--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297
GPG Key ID = A46BEE1A
|