Hello,
I saw an email chain from July of 2012 about this topic and wanted to find out whether there was anything new…
I ran pre-phasing of a chromosomal region of study data using SHAPEIT v2 with the intention of then imputing using IMPUTE2. My plink input files for SHAPEIT had missing data, but in the SHAPEIT output, the missing data proportion = 0 for all samples in the .sample file and the .haps file contains no ?'s. Therefore, I'm assuming that SHAPEIT imputed all of the missing genotypes while doing the phasing:
My questions are:
1) Does SHAPEIT by default fill in all missing genotypes regardless of how confidently it can predict these (i.e. regardless of the probability of the most likely genotype), or is there some threshold of confidence/probability beneath which SHAPEIT would not impute and would leave the data as missing (or hopefully coded as '?' for future use with IMPUTE2)?
2) Is there any way to find out in the SHAPEIT output what the probabilities of the imputed most likely genotypes are for the filled in missing data?
3) Has a method or option been developed to allow users to turn off the missing data imputation performed by SHAPEIT and to instead have SHAPEIT code missing data as '?' in the output file for use with IMPUTE2?
I realize that by pre-phasing, the estimation uncertainty in the study haplotypes is ignored, but I would prefer to not also ignore the uncertainty in missing genotype imputation for the study SNPs. I would rather leave missing genotype imputation to IMPUTE2 and use the imputation probabilities output from IMPUTE2 (for both filled in missing data and for imputation of new SNPs) in a dosage analysis.
Thanks,
Steve
To unsubscribe from the list visit this webpage https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=OXSTATGEN&A=1
|