Show simple item record

dc.identifier.urihttp://hdl.handle.net/1951/55943
dc.identifier.urihttp://hdl.handle.net/11401/71557
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractSingle-nucleotide polymorphisms (SNPs) are the most common type of genetic variation in human genome. Haplotypes which combine multiple SNPs into super-alleles have been widely used in modern genetic analysis, especially in human disease association studies. The Expectation Maximization (EM) algorithm is commonly used in haplotype phasing and frequency estimation, and Hardy-Weinberg (HW) equilibrium is a key assumption built into the EM algorithm. The accuracy of EM-based haplotype frequency estimation when the HW equilibrium assumption is violated has been explored by several studies. The general consensus is that the sampling error plays a more dominant role in haplotypes estimation than the estimation error due to HW deviation; the accuracy of haplotype frequency estimation tends to improve with increasing homozygosity in the sample. However, these studies mainly concentrated on the impact of SNP level HW deviation. A theoretical foundation for the impact of HW deviation at the haplotype level on haplotype frequency estimation has not been established. In this dissertation, we derived the theoretical relationship among three haplotype mean squared errors: between population and sample frequencies (MSEPS), between true sample and sample estimated frequencies (MSESE), and between population and sample estimated frequencies (MSEPE). The theoretical relationship between SNP level and haplotype level HW deviations was also established. Our simulations show that the violation of HW equilibrium at haplotype level could result in more severe haplotype estimation error than sampling error, and the accuracy of haplotype frequency estimation is not always improved with increasing homozygosity. To incorporate the possible haplotype level HW deviations into the haplotype frequency estimation process, we propose a Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) method which allows us to estimate HW deviation parameters and haplotype frequencies simultaneously. For two SNPs cases, the HWD-ECM algorithm consists of three iteration steps: 1). an expectation step estimating genotype frequencies allowing HW deviation parameters; 2). a conditional maximization step for HW deviation parameter estimation utilizing constraints of SNP level or haplotype level HW deviation parameters; and 3). a conditional maximization step for haplotype frequencies. Simulation results show that the HWD-ECM method performs significantly better than the EM-based approach in haplotype estimation when HWE assumption is violated. Algorithm for extension of HWD-ECM to multiple SNPs is also discussed.
dcterms.available2012-05-17T12:19:44Z
dcterms.available2015-04-24T14:47:54Z
dcterms.contributorJohn J. Chen.en_US
dcterms.contributorNancy R. Mendellen_US
dcterms.contributorWei Zhuen_US
dcterms.contributorBarbara Nemesure.en_US
dcterms.creatorAhn, Hyeong Jun
dcterms.dateAccepted2012-05-17T12:19:44Z
dcterms.dateAccepted2015-04-24T14:47:54Z
dcterms.dateSubmitted2012-05-17T12:19:44Z
dcterms.dateSubmitted2015-04-24T14:47:54Z
dcterms.descriptionDepartment of Applied Mathematics and Statisticsen_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierAhn_grad.sunysb_0771E_10595.pdfen_US
dcterms.identifierhttp://hdl.handle.net/1951/55943
dcterms.identifierhttp://hdl.handle.net/11401/71557
dcterms.issued2011-08-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2012-05-17T12:19:44Z (GMT). No. of bitstreams: 1 Ahn_grad.sunysb_0771E_10595.pdf: 2152686 bytes, checksum: 8b194976c8caf9f8b320b3603a196433 (MD5) Previous issue date: 1en
dcterms.provenanceMade available in DSpace on 2015-04-24T14:47:54Z (GMT). No. of bitstreams: 3 Ahn_grad.sunysb_0771E_10595.pdf.jpg: 1894 bytes, checksum: a6009c46e6ec8251b348085684cba80d (MD5) Ahn_grad.sunysb_0771E_10595.pdf: 2152686 bytes, checksum: 8b194976c8caf9f8b320b3603a196433 (MD5) Ahn_grad.sunysb_0771E_10595.pdf.txt: 125043 bytes, checksum: 5049ed4da5650498d628843eb3fb7dbe (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics -- Biostatistics
dcterms.subjectExpectation/Conditional Maximization (ECM) algorithm, Expectation Maximization (EM) algorithm, haplotype frequency estimation, Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) algorithm, Hardy-Weinberg (HW) deviation, Single-nucleotide polymorphism (SNP)
dcterms.titleHardy-Weinberg Deviation and EM-based Haplotype Frequency Estimation
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record