I really appreciate your attention to these questions, but personally it's way beyond the realms of my knowledge! Are there others on the list who could suggest R or Python libraries?
Failing that, the Association of Internet Researchers list at [log in to unmask] might have some suggestions.
Cheers, Mia
Sent from my handheld computing device
> On 22 Jul 2016, at 08:28, Stephen McConnachie <[log in to unmask]> wrote:
>
> Hi everyone,
>
> I have a statistical methodology question - what could be more exciting for a damp warm Friday? I realise it's not entirely in the comfort zone of this group, but I thought I'd try before exploring it with statistician contacts and broader research online.
>
> It's about managing missing data in survey response, where the missing data is Missing Not At Random (MNAR) aka nonignorable nonresponse. I'm interested in any established models to correct for bias. Maybe those of you who have conducted surveys have come across this and found a good, understandable solution?
>
> I'll explain the problem. Imagine you're conducting a survey where some of the questions are within the 'sensitive data' realm: race, gender, sexuality, disability. Imagine you're getting high 'prefer not to answer' levels , eg 50%. One flawed approach is listwise deletion, meaning that the 50% PNTA is simply excluded from analysis. This introduces a bias risk, because it's unlikely that the nonresponse is random, it's more likely to be meaningful - eg you might argue that over-represented cases - white, heterosexual males without disability - are slightly more likely to PNTA than under-represented cases. So deleting the PNTA is likely to introduce bias in your analysis, even if that nonrandomness is low level. A concrete example: removing 50% PNTA from the gender question might bias your analysis towards misleadingly high % female.
>
> There are complex statistical methodologies for approaching the management of this problem - multiple imputation, maximum likelihood estimation, etc - but the complexity is daunting to a non-statistician without a software package like Stata. So I wonder if any of you have done this and either found a simple solution or developed a complex solution which is transferable - in other words, does anyone have some Python they can give me / direct me to??
>
> All the best,
> Stephen
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
****************************************************************
website: http://museumscomputergroup.org.uk/
Twitter: http://www.twitter.com/ukmcg
Facebook: http://www.facebook.com/museumscomputergroup
[un]subscribe: http://museumscomputergroup.org.uk/email-list/
****************************************************************
|