Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/76538
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractGenotype misclassification errors are known to reduce the power to detect genetic association, but the size of the effect is not known in next generation sequencing (NGS). The non-centrality parameter (NCP) and hence power of the association test allowing for errors for a specified error model at a base pair was found. This NCP was compared to the NCP for the usual chi-square test. The asymptotic power was compared to simulated power for specific settings of the true genotype and phenotype frequencies in the case and control populations, genotype misclassification rates, and total sample size. An R script was provided for calculating the NCP. Next, the effect of misclassification error using data from NGS technology for case-control genetic association studies was modeled. The Likelihood Ratio Test Allowing for Error using NGS data (LRTNGS) was derived. The estimated genotype frequencies and misclassification rates from the observed base pair reads were calculated using the expectation-maximization (EM) algorithm. This statistic allows for both non-differential and differential misclassification. The distribution of LRTNGS was studied by simulations for both null and alternative settings. The effects of genotyping misclassification rates on the sample size needed to maintain the constant asymptotic Type I and Type II error rates were studied. For at risk minor allele frequencies less than 0.01, large sample sizes were required for the asymptotic distribution to be a good approximation. Increasing the sequencing coverage increased the estimated power and the adequacy of simulated power.
dcterms.available2017-09-20T16:50:35Z
dcterms.contributorFinch, Stephen J.en_US
dcterms.contributorMendell, Nancyen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorGordon, Derek.en_US
dcterms.creatorZhang, Ruiqi
dcterms.dateAccepted2017-09-20T16:50:35Z
dcterms.dateSubmitted2017-09-20T16:50:35Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent110 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/76538
dcterms.issued2014-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:50:35Z (GMT). No. of bitstreams: 1 Zhang_grad.sunysb_0771E_11872.pdf: 1538666 bytes, checksum: ab4d68704e8b22833c0ec56d3582aac4 (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics
dcterms.titleModeling the effect of sequencing error
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record