Dear Experts,
I have two questions about controlling for site/scanner variables of no interest that I hope you will be able to help me with.
Scenario: I have data acquired at four different sites and I just want to control for the effect of site in my multiple regression.
Question 1: How many dummy variables do I need to control for the four sites?
From a statistical point of view I am used to including N-1 dummy variables in my regression, so in this case 3 dummy variables. However, from reading posts on the list, it appears that most people would include four site variables. When I try to include four dummy variables in other packages, to do a region-of-interest analysis in R for example, the package does not allow four dummy variables to be included in the model because they are too related to each other, it will run the model using the N-1 rule. So why is it that people usually include all site variables if this leads to problems with the model in other packages?
Question 2: Is it necessary to demean dummy variables and why?
Thank you so much for your advice and suggestions!
Best wishes,
Amber
|