Hi Thomas,

thank you for your suggestion. I have just run the regression with bill/estimation as a binary variable forcing in accommodation size and HH size (and age, education, income). It shows that once the confounding variables have been controlled for estimated KWh usage is related to higher KWh usage. I wanted to now run the two groups separate as interestingly different factors explain KWh usage in the different groups. I think this is important for self-report data. Do you think this is legimate now that I have established the difference with the regression or do I still need to match the groups?

Best wishes

Iljana


Date: Mon, 8 Aug 2016 13:23:09 +0000
From: [log in to unmask]
Subject: Re: matching groups manually for independent t-test
To: [log in to unmask]



Deleting data will just introduce bias to your inference.

Two main options I think:

i) use multiple regression to control for all the confounding variables simultaneously
ii) use a matching algorithm to combine covariates for matching - usually done via propensity scores but some recent work suggests that isn't a good solution and that simpler methods such as euclidean distance matching are better (this may be the best bet if you have many covariates relative to the data)

In either case you may get better performance by using the logarithm of positively skewed variables such as household size

http://gking.harvard.edu/files/gking/files/psnot.pdf

In either case the results may not be too meaningful unless there is a fair amount of overlap in the distributions of consumption etc. between the groups.

Thom


From: Research of postgraduate psychologists. <[log in to unmask]> on behalf of I Schubert <[log in to unmask]>
Sent: 08 August 2016 12:36
To: [log in to unmask]
Subject: matching groups manually for independent t-test
 
Hi guys,

I am comparing groups of participants that have estimated their electricity consumption and those that have used a bill. I found that the groups vary significantly on the amount of electricity they use but also household size and accommodation size so two key variables that influence electricity consumption. I am unsure if there is anyway to match the groups on these variables by deleting some participants so that I can compare them on the electricity usage without having these confounding factors. I have found that taking the extreme measures of either end does not work as the distribution of the HH size is very different, i.e. the estimate group has got a lot of households with higher number of members (even though the max and min are the same) . Any suggestions would be very much appreciated.

Thank you

Iljana
DISCLAIMER: This email is intended solely for the addressee. It may contain private and confidential information. If you are not the intended addressee, please take no action based on it nor show a copy to anyone. In this case, please reply to this email to highlight the error. Opinions and information in this email that do not relate to the official business of Nottingham Trent University shall be understood as neither given nor endorsed by the University. Nottingham Trent University has taken steps to ensure that this email and any attachments are virus-free, but we do advise that the recipient should check that the email and its attachments are actually virus free. This is in keeping with good computing practice.