JISCMail - ALLSTAT Archives

Robert,
Thank you for your complete and thoughtful answer. It will take a while
to digest . . . but digest it I will.
Again my thanks!
John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


>>> Robert Newcombe <[log in to unmask]> 12/29/2014 6:52 AM >>>

No, that’s not right at all – the formula given by Aheto below relates
to the difference of two means, which is something totally different.

I assume that ‘the concatenation of the two samples’ means that they
are simply to be merged.
If so, then you need to get back to N1, T1 and SSq1 for sample 1, the
quantities that the mean and SD and SE were calculated from.
Then similarly N2, T2 and SSq2 for sample 2.
Then use N=N1+N2, T=T1+T2, SSq=SSq1+SSq2.
Then use these to calculate the pooled mean, and the SD for the
combined sample and hence the SE of the pooled mean.

Essentially, T1 = N1*mean1
SSq1 = N1*mean1^2 + (N1-1)*SD1^2
SD1 = SE1*sqrt(N1)

With corresponding formulae for sample 2, and then to go back from the
cumulants for the pooled sample.
Best done in a spreadsheet, checking each stage carefully!
One test of correct programming is to use zeros for sample 2, you
should then end up reproducing the mean and SE for sample 1.

Note that here, the SD and hence the SE for the mean of the combined
sample has a contribution from mean1-mean2, in effect.
If the means of the two samples are substantially different, this
difference will increase the SD for the pooled sample.
Think of a sample consisting of body weights of two different breeds of
dogs, one very small breed and one very large breed.
In this situation (in which you obviously wouldn’t want to naively pool
the two samples), the overall SD would come mainly from between-breeds
differences, not from within-breeds differences.
This is of course the starting point for the concept of analysis of
variance.

SO – always consider carefully what it all means!
If the two samples apparently differ, either in mean or in spread (as
measured by the SD), then it’s pretty meaningless to pool the data for
the two samples.
Also, depending on how N1 and N2 were arrived at, you could argue for
using an estimated mean of (mean1 + mean2)/2, instead of (N1*mean1 +
N2*mean2)/(N1+N2).
This will have a quite different standard error, based on the pooled
SD, which is sqrt{(N1-1)*SD1^2 + (N2-1)*SD2*2)/(N1+N2-2)}.
In this situation, too, you would need to check that SD1 and SD2 were
similar enough to warrant pooling.

Hope this helps.

Robert G. Newcombe PhD CStat FFPH HonMRCR
Professor of Biostatistics
Cochrane Institute of Primary Care and Public Health
School of Medicine
Cardiff University
4th floor, Neuadd Meirionnydd
Heath Park, Cardiff CF14 4YS

Tel: (+44) 29 2068 7260

My book Confidence Intervals for Proportions and Related Measures of
Effect Size is now published.

Available at http://www.crcpress.com/product/isbn/9781439812785

See http://www.facebook.com/confidenceintervals

Home page https://sites.google.com/site/robertgnewcombe/



From: A UK-based worldwide e-mail broadcast system mailing list
[mailto:[log in to unmask]] On Behalf Of John Sorkin
Sent: 29 December 2014 11:40
To: [log in to unmask]


I thank you for your suggestion, but I don't see how it helps answer my
question.
John


From: A UK-based worldwide e-mail broadcast system mailing list
[mailto:[log in to unmask]] On Behalf Of Justice Moses K. Aheto
Sent: 29 December 2014 11:26
To: [log in to unmask]



Hi John,

I think you should have a look at pooled standard errors which you can
use to estimate the combined standard error for the two samples.

Example below:

SEx1-x2 = sqrt [ s21 / n + s22 / m ]

Hope this helps.

Cheers



Kind regards

*****************************************
Justice Moses K. Aheto
PhD Candidate in Medicine (United Kingdom)
MSc Medical Statistics (United Kingdom)
BSc Statistics (Ghana)
HND Statistics (Ghana)

Chief Executive Officer
Statistics and Analytics Consultancy Services Ltd.
E-mail: [log in to unmask]
Skype: jascall12
Mobile: +447417589148.



On Monday, December 29, 2014 11:12 AM, John Sorkin
<[log in to unmask]> wrote:



How can I compute the SE (standard error) of the concatenation of two
samples?

Assume sample 1 has n observations,

   	     sample 2 has m observations

The concatenation of sample 1 with sample 2 would have n+m
observations.

If I know the mean of sample 1 = m1, and SE sample 1=se1,

   	       the mean of sample 2 = m2, and SE sample 2=se2,



can I use n, m1, se1, m, m2, se2 to compute the SE?



Thank you,

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)



Confidentiality Statement:
This email message, including any attachments, is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. Any unauthorized use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.