Using Binary Dosage files

library(BinaryDosage)

The following routines are available for accessing information contained in binary dosage files

getbdinfo

The getbdinfo routine returns information about a binary dosage file. For more information about the data returned see Genetic File Information. This information needs to be passed to bdapplygetsnp routines so it can read the binary dosage file.

The only parameter used by getbdinfo is bdfiles. This parameter is a character vector. If the format of the binary dosage file is 1, 2, or 3, this must be a character vector of length 3 with the following values, binary dosage file name, family file name, and map file name. If the format of the binary dosage file is 4 then the parameter value is a single character value with the name of the binary dosage file.

The following code gets the information about the binary dosage file vcf1a.bdose.

bd1afile <- system.file("extdata", "vcf1a.bdose", package = "BinaryDosage")
bd1ainfo <- getbdinfo(bdfiles = bd1afile)

bdapply

The bdapply routine applies a function to all SNPs in the binary dosage file. The routine returns a list with length equal to the number of SNPs in the binary dosage file. Each element in the list is the value returned by the user supplied function. The routine takes the following parameters.

The user supplied function must have the following parameters.

The user supplied function can have other parameters. These parameters need to passed to the bdapply routine.

There is a function in the BinaryDosage package named getaaf that calculates the alternate allele frequencies and is the format needed by vcfapply routine. The following uses getaaf to calculate the alternate allele frequency for each SNP in the vcf1a.bdose file using the bdapply routine.

aaf <- unlist(bdapply(bdinfo = bd1ainfo, func = getaaf))

altallelefreq <- data.frame(SNP = bd1ainfo$snps$snpid, aafcalc = aaf)
knitr::kable(altallelefreq, caption = "Information vs Calculated aaf", digits = 3)
Information vs Calculated aaf
SNP aafcalc
1:10000:C:A 0.347
1:11000:T:C 0.016
1:12000:T:C 0.265
1:13000:T:C 0.297
1:14000:G:C 0.204
1:15000:A:C 0.524
1:16000:G:A 0.425
1:17000:C:A 0.443
1:18000:C:G 0.265
1:19000:T:G 0.245

getsnp

The getsnp routine return the dosage and genotype probabilities for each subject for a given SNP in a binary dosage file.

The routine takes the following parameters.

The following code returns the dosage values and the genotype probabilities for SNP 1:12000:T:C from the *vcf1a.bdose" binary dosage file.

snp3 <- data.frame(getsnp(bdinfo = bd1ainfo, "1:12000:T:C", FALSE))

knitr::kable(snp3[1:20, ], caption = "SNP 1:12000:T:C", digits = 3)
SNP 1:12000:T:C
dosage p0 p1 p2
1.000 0.000 1.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
1.016 0.012 0.959 0.029
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
1.002 0.000 0.998 0.002
0.000 1.000 0.000 0.000
0.808 0.192 0.808 0.000
0.168 0.832 0.168 0.000
0.000 1.000 0.000 0.000
1.025 0.000 0.975 0.025
0.000 1.000 0.000 0.000
1.000 0.000 1.000 0.000