Dear All,
My proposal is not quite as Wayne said. The Boostrap procedure involves
doing lots of fits, each time taking n points at random, but with
replacement, from the original set of n points. As a result some points
will be included twice (or more) and some will be missing. You will get a
distribution of parameter values that you can then treat as you wish. It
is in the litterature - I found it in Numerical Recipes once upon a time.
The advantage is that the result does not depend on having an estimate of
your errors or their distribution, let alone a good estimate. Which I
think we do not really have when measuring NMR peaks. If your parameters
depend critically on one or two points (for instance because only the
first two points have any nitensity in them), you might get infinite
errors, though, when those points were excluded.
Yours,
Rasmus
---------------------------------------------------------------------------
Dr. Rasmus H. Fogh Email: [log in to unmask]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
On Fri, 26 Aug 2005, Wayne Boucher wrote:
> My original email to NMRGEN bounced (Jiscmail does not know that
> wb104@bioc == [log in to unmask]). So here is the original email and Chris'
> reply.
>
> Wayne
>
> ---------- Forwarded message ----------
> Date: Fri, 26 Aug 2005 09:10:25 +0000
> From: Chris MacRaild <[log in to unmask]>
> Reply-To: CcpNmr software mailing list <[log in to unmask]>
> To: [log in to unmask]
> Subject: Re: Relaxation rates calculation
>
> Hi,
>
> As I understand it, the most common way of calculating these errors is
> as follows:
> Perform the fit to the experimental data, then back-calculate the
> expected y values for each experimentally sampled x value. Generate a
> large number of new data sets by adding 'noise' of appropriate magnitude
> to the back-calculated y values. Perform the fit to each new dataset -
> the errors are then calculated from the distribution of best-fit
> parameters obtained.
>
> This is similar to what is currently implimented in analysis, except
> that analysis uses the experimental data as the basis for the simulated
> datasets, rather than the best-fit values. The problem with the analysis
> approach is that it adds synthetic 'noise' to experimental data, which
> is already noisy. Thus it tends to over-estimate the effects of noise on
> the fit values.
>
> To comment on the original observation of error estimates varying every
> time the analysis is performed, Tim is of course right that sampling
> more synthetic data sets will improve the situation. In my experience,
> however, 1000 iterations is more than enough to give an adequate
> estimate (I would expect variation of not more than, say, 10 percent on
> the error estimates for repeated analysis of typical 15N relaxation
> data). One potential source of problems here arises where the minimiser
> fails to find the best fit for some of the 1000 simulations. I've only
> ever seen this to occur when the quality of the data or the original fit
> was very poor, though it is also likely if the minimiser is not properly
> configured or if the initial parameter estimates are way off. Of course
> if the quality of the original fit is not good, then it is meaningless
> to even ask the question of uncertainties on fit parameters.
>
> Hope this is helpful,
>
> Chris MacRaild
>
>
>
>
> On Thu, 2005-08-25 at 16:25, Wayne Boucher wrote:
> > Hello,
> >
> > (Cross posted to NMRGEN although that doesn't have many subscribers yet.)
> >
> > If anyone has a good idea about how to calculate error estimates for the
> > parameters for this kind of curve fitting then let us know. Rasmus said
> > he once used a method which involved randomly removing points from the
> > analysis and then calculating the standard deviation of the resultant
> > parameter fits. As Tim has mentioned, what we have done in the code is to
> > sample the (x, y) points around their stated values using the given
> > deviations. I'm not sure about the theoretical underpinning of either
> > method.
> >
> > Wayne
> >
> > On Thu, 25 Aug 2005, Tim Stevens wrote:
> >
> > > > I've a question about the Rate Analysis part of analysis. I have loaded in
> > > > my T2 spectra etc, calculated errors for each and put them in the
> > > > condition point errors. When I select Group Peaks I get a set of results,
> > > > however if I select Recalculate All Rates, without changing anything, the
> > > > TC error changes. When pressed again it changes to something different
> > > > again.
> > >
> > > There is no easy way of calculating the error in the time constant. So
> > > what we have done is to sample a number of fittings (currently 1000)
> > > within the error widths of the data points. We estimate the error in the
> > > time constant from the standard deviation of the many fitting attempts.
> > >
> > > Because the sampling fits are chosen randomly when the error calculation
> > > is repeated the fits are different and hence the time constant error
> > > alters.
> > >
> > > One thing you can do to limit the variation is goto line 390 in
> > > $CCPNMR_HOME/python/ccpnmr/analysis/DataAnalysisBasic.py and increase the
> > > number of samples "nIter = 1000" to "nIter = 10000" or larger, but bearing
> > > in mind that the more samples the slower the fitting will be.
> > >
> > >
> > > > I also found that there usually is an error which is miles out from
> > > > the rest (i.e. say if most were around 10 then the odd one was maybe 6000,
> > > > or something ridiculously small). Recalculate and the no. changes again.
> > >
> > > If the Fit Error is also large I would suggest having a look at the data
> > > points in the graph [Show Function Fit] to see if there's anything
> > > obviously wrong.
> > >
> > > There is also the issue that the fit is very sensitive to the X (time)
> > > errors. I will discuss this further with Wayne.
> > >
> > > Tim
> > >
> > >
> > > -------------------------------------------------------------------------------
> > > Dr Tim Stevens Email: [log in to unmask]
> > > Department of Biochemistry [log in to unmask]
> > > University of Cambridge Phone: +44 1223 766022 (office)
> > > 80 Tennis Court Road +44 7816 338275 (mobile)
> > > Old Addenbrooke's Site +44 1223 364613 (home)
> > > Cambridge CB2 1GA WWWeb: http://www.bio.cam.ac.uk/~tjs23
> > > United Kingdom http://www.pantonia.co.uk
> > > -------------------------------------------------------------------------------
> > > ------ +NH3CH(CH(CH3)OH)C(O)NHCH(CH(CH3)CH2CH3)C(O)NHCH(CH2CH2SCH3)CO2- -------
> > > -------------------------------------------------------------------------------
> > >
> >
>
|