Is it that the MAF's are population specific - so if you were to compute
using only, say, the CEU individuals, they would be correct?
On 26/01/16 02:10, Timothy Mak wrote:
> I have been using the 1000G Phase 3 data as provided at https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference. However, I noticed there seems to be some discrepancy between the .hap file provided and the .legend file, in that the MAFs don't match. Here's an R script that demonstrates the difference. Can someone tell me why?
>
> Many many thanks!
>
> Tim
>
> ### R script to show difference ###
> leg.con <- gzfile("1000GP_Phase3_chr22.legend.gz")
> nlines <- 10
> leg <- strsplit(readLines(leg.con, n=nlines + 1), split=" ")
> leg2 <- as.data.frame(do.call(rbind, leg[-1]))
> colnames(leg2) <- leg[[1]]
>
> hap.con <- gzfile("1000GP_Phase3_chr22.hap.gz")
> hap.ncol <- 5008
> hap <- matrix(scan(hap.con, n=hap.ncol * nlines), ncol=hap.ncol)
> hapmeans <- rowMeans(hap)
> cbind(hapmeans, as.numeric(as.character(leg2$ALL)))
> ### End of R script ###
>
> Output:
> hapmeans
> [1,] 0.0013977636 0.0001996805
> [2,] 0.0013977636 0.0063897764
> [3,] 0.0009984026 0.0075878594
> [4,] 0.0019968051 0.0001996805
> [5,] 0.0031948882 0.0001996805
> [6,] 0.0021964856 0.0003993610
> [7,] 0.0009984026 0.0009984026
> [8,] 0.0025958466 0.0003993610
> [9,] 0.0023961661 0.0001996805
> [10,] 0.0011980831 0.0017971246
>
> One would expect the left column to either equal the right column, or 1 - the right column, but it is neither.
>
> To unsubscribe from the list visit this webpage https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=OXSTATGEN&A=1
>
To unsubscribe from the list visit this webpage https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=OXSTATGEN&A=1
|