2022-05-24 Version 1.2-9 * Modifications to the package to conform with current CRAN requirements. Also updated url's 2015-05-08 Version 1.2-8 * Modifications to the package so that it works with upcoming R changes to nchar(). Also updates to email addresses and url's. 2014-08-25 Version 1.2-7 * Moved vignettes to directory vignettes, as required by CRAN 2012-9-20 Version 1.2-6 * Changes to the package so that only public R functions are now used. 2012-04-05 Version 1.2-5 2010-7-5 Version 1.2-4 * hapassoc() now handles the case of all haplotypes being included in the model formula. If 'baseline' haplotype is given, it will be the reference group. If not, the most frequent haplotype will be the reference group. 2009-11-19 Version 1.2-3 * fixed bug: hapassoc() failed when using the design="cc" option with no non-genetic variables in the dataset. 2008-10-03 Version 1.2-2 This is a bug-fix release. Version 1.2-1 of the package was missing one of the C source files and would crash for case-control data (design="cc"). For prospective or cross-sectional data, the Poisson and Gamma log-likelihood functions used an undefined indexing variable; this bug has now been fixed. 2008-07-19 Version 1.2-1 Hapassoc was originally developed to provide likelihood inference for prospectively collected (cohort) or cross-sectional data. We have made some provisions to accommodate case-control data (with future plans for more) by including an implementation of association and haplotype frequency estimators arising from the modified prospective score equations (MPSE) of Spinka et al. (2005). The implementation is due to Chen 2006. A summary of changes to the package follows. Documentation ------------- - Reduced emphasis on likelihood inference because hapassoc now implements association estimators for case-control data from an unbiased estimating equation approach (Spinka et al. 2005). - Added documentation of two new arguments to hapassoc: * design can either be "cohort" (default) for cohort or cross-sectional data or "cc" for case-control * disease.prob specifies the marginal probability of disease for the case-control approach. See the Details section of the hapassoc documentation for more. - Revised the "Details" section of the hapassoc help file. Added the following: "When the study design is case-control, i.e. genotypes and non-genetic attributes have been sampled retrospectively given disease status, naive application of prospective maximum likelihood methods can yield biased inference (Spinka et al., 2005, Chen, 2006). Therefore, when \code{design="cc"}, the algorithm solves the modified prospective score equations or MPSE (Spinka et al. 2005) for regression and haplotype frequency parameters. The implementation in \pkg{hapassoc} is due to Chen (2006). In general, the MPSE approach requires that the marginal probability of disease, P(D=1), be known. An exception is when the disease is rare; hence, when \code{disease.prob=NULL} (the default) a rare disease is assumed. The variance-covariance matrix of the regression parameter and haplotype frequency estimators is approximated as described in Chen (2006). Limited simulations indicate that the resulting standard errors for regression parameters perform well, but not the standard errors for haplotype frequencies, which should be ignored. For case-control data, we hope to implement the variance-covariance estimator of Spinka et al. (2005) in a future version of \pkg{hapassoc}." Source files ------------ In DESCRIPTION: - Noted Zhijian Chen's contribution In R/hapassoc.R: - Added error check to make sure family=binomial() if design="cc" - Changed code that handles formulae of the form "y ~ ." Previously, when hapassoc was passed such a formula it dropped the baseline haplotype from the design matrix, which has the effect of using the baseline haplotype as the baseline in all calls to glm. However, the MPSE code needs all haplotypes to be present in the data frame. The modified code paste()s together the appropriate formula; for example, if there are 2 SNPs and h00 is the baseline haplotype, and there are no non-genetic covariates, paste together the formula y ~ h01 + h10 + h11. None of the columns in the design matrix are dropped. - Where there used to be just a single while loop to do the EM algorithm there is now an if-else. If design="cc", do the MPSE code, else do the regular hapassoc code. In the "if", there are a lot of preliminary calculations that need to be done (about 75 lines of code) before the while loop. - The output item "dispersionML" has been renamed dispersion. This required some changes in the log-likelihood code too. - The outut item "loglik" is set to NA when design="cc"; the MPSE are estimating equations, so this does not apply. - Added utility functions r.Omega() and get.diplofreq() to the hapassoc.R source file. These are used in the MPSE code. In src/tapply_sum.c: - New file that contains the function tapply_sum written by S. Blay to speed up the MPSE code. Future work ----------- - The Spinka et al. (2005) variance calculation for case-control data has not yet been implemented. Currently, an approximation shown by Chen (2006) to have reasonably good properties is used. References ---------- Chen, Z. (2006): Approximate likelihood inference for haplotype risks in case-control studies of a rare disease, Masters thesis, Statistics and Actuarial Science, Simon Fraser University, available at \url{http://www.stat.sfu.ca/people/alumni/Theses/Chen-2006.pdf}. Spinka, C., Carroll, R. J. & Chatterjee, N. (2005). Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity. Genetic Epidemiology, \bold{29}, 108-127. 2006-07-20 Version 1.1 * fixed bug to enable formula = dependent ~ 1 in hapassoc() and summary.hapassoc() * hapassoc() now also returns the function call * summary.hapassoc() now also returns the hapassoc function call, the number of subjects used in the analysis, the name of the family and the log-likelihood. The returned object is printed nicely. 2006-04-28 Version 1.0-1 * added the following second citation: Burkett K, Graham J and McNeney B (2006). hapassoc: Software for Likelihood Inference of Trait Associations with SNP Haplotypes and Other Attributes. Journal of Statistical Software, 16(2), 1-19 2006-04-04 Version 1.0 * hapassoc(): the "baseline" argument is documented to have default equal to the most common haplotype, but the code to implement this default was lost and needed to be replaced. * hapassoc(): added a "verbose" flag. Default is verbose=FALSE. If TRUE users see the iteration number and value of the convergence criterion at each iteration of the EM algorithm. * pre.hapassoc(): added a "verbose" flag. Default is verbose=TRUE. If TRUE users see a list of the SNP genotypes used to form haplotypes and a list of the other "non-haplotype" variables * Package vignette "hapassoc" added. After loading the package, type vignette("hapassoc") to view. 2006-03-22 * Overall addition of the log-likelihood functions * hapassoc(): function now returns log-likelihood and model * logLik.hapassoc(): New function to extract the log-likelihood from a hapassoc object * anova.hapassoc(): New function to perform likelihood ratio test on two hapassoc objects. 2006-02-02 Minor changes: * EMvar(): fixed a bug occurring when all haplotype phases are known. * RecodeHaplos(): fixed a bug where a single column of non-haplotype data in a non-allelic data set was losing its name. * hapassoc(): Change "..." argument of hapassoc to "start". Previously the only intended use of "..." was to allow the user to pass in "start" for starting values to the glm function, rather than to allow the user to pass in other optional arguments to glm. We have now made this more explicit by making this argument more specific. 2005-07-13 Version 0.7-1 * handleMissings(), pre.hapassoc(): instead of casting to a data.frame use indexing argument drop=FALSE. 2005-06-30 Version 0.7 * hapassoc(): use initial weights in the glm to get initial parameter estimates. * hapassoc(), pre.hapassoc(): replaced weights calcuation with a C function. Speed up computing time for large data sets. 2005-05-31 Version 0.6-2 * handleMissings(): fixed a bug for SNPs with a rare allele that is in no instance in a homozygous state. 2005-05-09 Version 0.6-1 * Added this ChangeLog file * Added inst/CITATION file 2005-04-06 Version 0.6 * RecodeHaplos(): Allow input SNP data as alphabetic alleles (e.g.A,G,C,T), and for genotypes to be input either in a single two-character column ("genotypic format"), or as a pair of columns (the original "allelic format" from earlier versions of hapassoc) * pre.hapassoc(): Added allelic argument to indicate whether SNP data are input in either genotypic fromat or allelic format * RecodeHaplos(): Check for the number of alleles at each locus. If the check finds loci with >2 alleles, stop execution and print an error message that tells the user that only diallelic loci are allowed. * RecodeHaplos(): Convert all missing data in "" format to NA. * happasoc(): The convergence criteria for the EM algorithm has been tightened. We now require both absolute and relative changes in the parameter estimates from one iteration to the next to be below the user-specified tolerance. * summary.hapassoc(): Added a check for converged FALSE, and now print a warning (used to just give a cryptic error). * happasoc(): Changed the name of the variable 'gamma' to 'freq' and the name of the variable 'initGamma' to initFreq' * pre.hapassoc(): Changed the name of the returned variable 'initGamma' to 'initFreq' pre.hapassoc man page * pre.hapassoc(): Added a new example of how to use pre.hapassoc with SNPs in the new genotypic format * Added documentation to describe how single-locus genotypes may now be specified as a single two-character column ("genotypic format") in the input data frame * Added a Note to alert users to ignore the possible warnings related to row.names being duplicated when there are missing genotypes on some of the loci for an individual * Updated the reference to Burkett et al. to give journal volume and page numbers hapassoc man page * Added a Note to alert users to the warning they'll see when fitting logistic regression models (non-integer #successes...) * Added more comments to the examples to make the coding of columns in haploDM more obvious, and added an example of a non-multiplicative logistic regression model * Updated the reference to Burkett et al. summary.hapassoc man page * Updated the reference to Burkett et al. 2004-11-03 Version 0.5-1 * handleMissings() assignment to nonSNPdat changed to accomodate rbind of data frames which contain factors. * hapassoc(): Assigned response<-regr$y We used to use model.response to extract the response variable, but this lead to problems in calculating the residuals in the pYgivenX function. It is better to fit the model with the glm function (see code) and then extract the response from the fitted model object 2004-09-29 Version 0.5 * Changed file name from PreEM.R to PreHap.R * Changed function name from EM to hapassoc * Changed function name from PreEM to pre.hapassoc * changed function name from summary.EM to summary.hapassoc