Robert,
Thank you for your complete and thoughtful answer. It will take a while to digest . . . but digest it I will.
Again my thanks!
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


>>> Robert Newcombe <[log in to unmask]> 12/29/2014 6:52 AM >>>

No, that’s not right at all – the formula given by Aheto below relates to the difference of two means, which is something totally different.

 

I assume that ‘the concatenation of the two samples’ means that they are simply to be merged.

If so, then you need to get back to N1, T1 and SSq1 for sample 1, the quantities that the mean and SD and SE were calculated from.

Then similarly N2, T2 and SSq2 for sample 2.

Then use N=N1+N2, T=T1+T2, SSq=SSq1+SSq2.

Then use these to calculate the pooled mean, and the SD for the combined sample and hence the SE of the pooled mean.

 

Essentially, T1 = N1*mean1

SSq1 = N1*mean1^2 + (N1-1)*SD1^2

SD1 = SE1*sqrt(N1)

 

With corresponding formulae for sample 2, and then to go back from the cumulants for the pooled sample.

Best done in a spreadsheet, checking each stage carefully!

One test of correct programming is to use zeros for sample 2, you should then end up reproducing the mean and SE for sample 1.

 

Note that here, the SD and hence the SE for the mean of the combined sample has a contribution from mean1-mean2, in effect.

If the means of the two samples are substantially different, this difference will increase the SD for the pooled sample.

Think of a sample consisting of body weights of two different breeds of dogs, one very small breed and one very large breed.

In this situation (in which you obviously wouldn’t want to naively pool the two samples), the overall SD would come mainly from between-breeds differences, not from within-breeds differences.

This is of course the starting point for the concept of analysis of variance.

 

SO – always consider carefully what it all means!

If the two samples apparently differ, either in mean or in spread (as measured by the SD), then it’s pretty meaningless to pool the data for the two samples.

Also, depending on how N1 and N2 were arrived at, you could argue for using an estimated mean of (mean1 + mean2)/2, instead of (N1*mean1 + N2*mean2)/(N1+N2).

This will have a quite different standard error, based on the pooled SD, which is sqrt{(N1-1)*SD1^2 + (N2-1)*SD2*2)/(N1+N2-2)}.

In this situation, too, you would need to check that SD1 and SD2 were similar enough to warrant pooling.

 

Hope this helps.

 

Robert G. Newcombe PhD CStat FFPH HonMRCR

Professor of Biostatistics

Cochrane Institute of Primary Care and Public Health

School of Medicine

Cardiff University

4th floor, Neuadd Meirionnydd

Heath Park, Cardiff CF14 4YS

 

Tel: (+44) 29 2068 7260

 

My book Confidence Intervals for Proportions and Related Measures of Effect Size is now published.

 

Available at http://www.crcpress.com/product/isbn/9781439812785

 

See http://www.facebook.com/confidenceintervals

 

Home page https://sites.google.com/site/robertgnewcombe/

 

 

 

From: A UK-based worldwide e-mail broadcast system mailing list [mailto:[log in to unmask]] On Behalf Of John Sorkin
Sent: 29 December 2014 11:40
To: [log in to unmask]

I thank you for your suggestion, but I don't see how it helps answer my question.

John

 

 

From: A UK-based worldwide e-mail broadcast system mailing list [mailto:[log in to unmask]] On Behalf Of Justice Moses K. Aheto
Sent: 29 December 2014 11:26
To: [log in to unmask]

Hi John,

I think you should have a look at pooled standard errors which you can use to estimate the combined standard error for the two samples.

Example below: 

SEx1-x2 = sqrt [ s21 / n + s22 / m ] 

Hope this helps.

Cheers 

 

Kind regards

*****************************************
Justice Moses K. Aheto
PhD Candidate in Medicine (United Kingdom)
MSc Medical Statistics (United Kingdom)
BSc Statistics (Ghana)
HND Statistics (Ghana)

Chief Executive Officer
Statistics and Analytics Consultancy Services Ltd.
E-mail: [log in to unmask]
Skype: jascall12
Mobile: +447417589148.

 

On Monday, December 29, 2014 11:12 AM, John Sorkin <[log in to unmask]> wrote:

 

How can I compute the SE (standard error) of the concatenation of two samples?

Assume sample 1 has n observations,

            sample 2 has m observations

The concatenation of sample 1 with sample 2 would have n+m observations.

If I know the mean of sample 1 = m1, and SE sample 1=se1, 

              the mean of sample 2 = m2, and SE sample 2=se2,

 

can I use n, m1, se1, m, m2, se2 to compute the SE?

 

Thank you,

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 


Confidentiality Statement:

This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.