Hi Randy,
So I've been playing around with equations myself, and I have some alternative results.
As I understand your Mathematica stuff, you are using the data model:
ip = ij + ib'
ib
where ip is the measured peak (before any background correction), and ij is a random sample from the true intensity j. Here ib is the measured background, whereas ib' is the background absorbed into ip. Both ib and ib' are a random sample from background jb. Again, only ip and ib are observed; ij and ib' are "hidden" variables.
Now let me recap your treatment of that model (hopefully I get this right).
You assume Poisson distributions for ip, ij, ib, and ib', and find the joint probability of observed ip and ib given j and jb, p(ip,ib|j,jb). You can consider ip and ib as statistically independent, since ip depends on ib', not ib. You then marginalize over jb (the true background intensity) using a flat uninformative prior, giving p(ip,ib|j). You find that p(ip,ib|j) is similar to F&W's p(ip-ib|j, sdj), where sdj=sqrt(ip+ib).
Some sort of scaling is necessary, since in practice ib and ip are counted from different numbers of pixels. You find that, for roughly equal scaling, the Poisson version is similar to F&W's Gaussian approximation for even moderate counts.
However, in practice, we measure the background from a much larger area than the spot. For example, in the mosflm window I have open now, the background area is > 20 times the spot area, for high res, low SNR spots. Similarly, in xds the spot-to-background ratio, in terms of pixel #, is > 10 on average and > 5 for the great majority of spots. Therefore, we typically know the value of jb to a much better precision than what we can get from ip (which is essentially an estimate of j+jb).
If the relative sd of the background is about 2 or 3 times less than that of the spot ip, we can approximate the background estimate of jb as a constant (ie, ignore the uncertainty in its value). This will be valid if the total area used for the background measurement is roughly >5 times the area of the spot (even less for "negative" peaks). So what we can do is estimate jb using ib, and then find the conditional distribution of j given ip and jb. Using your notation, this distribution is given by:
p(j|ip,jb) = exp(-(jb+j)) (jb+j)^ip / Gamma(ip+1,jb)
where Gamma(.,.) is the upper incomplete gamma function.
The moments of this distribution have nice analytical forms (well, at least as nice as F&W's). Here's a table comparing the F&W estimates to this Poisson treatment, using Randy's ip and jb values, plus some others:
ip jb Exp[j]_fw SD[j]_fw h Exp[j]_dt SD[j]_dt %diff
---- ---- --------- -------- --- --------- -------- -----
55 45 11.3 6.3 1.3 11.9 6.8 5.3
45 55 3.0 2.6 -1.5 3.7 3.3 5.4
35 65 1.1 1.1 -5.1 2.0 2.0 86
6 10 1.0 0.91 -1.6 1.8 1.7 80
1 3 0.37 0.34 -2.0 1.3 1.2 240
4 12 0.45 0.43 -4.0 1.4 1.3 210
100 100 8.0 6.0 0 8.6 6.6 7.4
85 100 3.9 3.4 -1.6 4.7 4.2 20
75 100 2.5 2.4 -2.9 3.4 3.2 35
500 500 17.8 13.5 0 18.4 14.0 3.3
440 500 6.2 5.8 -2.9 7.0 6.6 14
1000 1000 25.2 19.1 0 25.8 21 2.3
920 1000 9.4 8.8 -2.6 10.3 9.5 9.1
940 1000 11.6 10.5 -2.0 12.4 11 7
In this table I've used sdj=sqrt(ip) for F&W, since I'm ignoring the uncertainty in jb --- Randy used sqrt(ip+ib).
h = (ip-jb)/sdj
%diff = (Exp[j]_dt - Exp[j]_fw)/Exp[j]_fw
Here jb is the # background counts normalized to have the same pixel area as ip.
Whether these would be considered important differences, I'm not sure. The differences are greatest when ip<jb (that is, for "negative intensities").
As an aside:
It's easy to expand this to include the acentric Wilson prior:
p(j|ip,jb,w) = exp(-(jb+j)(w+1)) (jb+j)^ip (w+1)^(ip+1) / Gamma(ip+1,jb(w+1))
where w = 1/sigma_w, sigma_w = the Wilson sigma. Again, the moments have analytical forms.
On Jul 1, 2013, at 5:47 AM, Randy Read <[log in to unmask]> wrote:
> Hi,
>
> I've been following this discussion, and I was particularly interested by the suggestion that some information might be lost by turning the separate peak and background measurements into a single difference. I accept the point that there might be value in, e.g., TDS models that pay explicit attention to non-Bragg intensities, but this whole discussion started from the point of what estimates to use for diffracted Bragg intensities in processes such as molecular replacement, refinement, and map calculations.
>
> I thought I'd run this past the two of you, in case I've missed something. What I decided to look at is the probability distribution for the true diffraction intensity, given the peak and background measurements. I'm assuming that the peak and background measurements have a Poisson distribution from counting statistics, which seems fine because I'm comparing the Poisson model with a Gaussian approximation, and if there were other sources of non-Poisson error the true distribution should become more Gaussian anyway.
>
> Where I thought it might make a difference to consider the peak and background separately is that, if the net intensity comes out as negative in the simple difference, this implies that the random errors in the peak region are more negative than the random errors in the background region. By considering a joint distribution of peak and background counts, connected by a common underlying background intensity probability, you might hope that this would make a difference.
>
> However, what I get (which I hope you can follow from the PDF derived from a Mathematica notebook) is that, even for very small numbers of counts, the F&W model of integrating over the positive intensity values consistent with a Gaussian peak-background difference gives surprisingly good agreement with the difference of Poisson variables model.
>
> In the Poisson derivation, I'm using Bayes' rule to turn the joint probability of the pair of peak/background intensity measurements into the joint probability of the true underlying intensities. Like F&W, this requires a prior probability distribution for the true underlying intensities. I'm using an uninformative prior (allowing all positive values with equal probability) and comparing that to F&W also with uninformative priors. Wilson priors could be used for both instead, but that wouldn't really change the result of the comparison.
>
> I've allowed for the possibility that there are different scale factors for the peak and background regions (so that there are different sizes of random errors for the background component of both measurements), and that only seems to make a minor difference in the results.
>
> So my conclusion is that, if what you're after is the probability distribution of the true intensity, there's no extra information gained in considering peak and background separately, and that the simple Gaussian approximation of the difference of two Poisson variables is surprisingly good. In other words, for these purposes, the difference between peak and background (and the standard deviation of that difference) is a sufficient statistic.
>
> Of course, I'd be very grateful if either of you found anything wrong with the reasoning! Also, I've lost some of the messages in this thread, so I don't know if these probability distributions are reproducing anything that has already been discussed.
>
> Best wishes,
>
> Randy
> <pIntensityPeakBkg.pdf>
> ------
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research Tel: + 44 1223 336500
> Wellcome Trust/MRC Building Fax: + 44 1223 336827
> Hills Road E-mail: [log in to unmask]
> Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
>
|