Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/76029
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractEstimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of an individual. A mixture distribution was used to model the proportion of alternative alleles at a specified base pair position assuming a biallelic single nucleotide polymorphism model. I measured the proportion of alternative alleles for positions in chromosome 1 exome sequencing data fro two trios taken from the Pilot 3 data in the 1000 Genomes Project. The measurements were based on the counts of reference and alternative alleles calculated by the SAMtools genetic software. The mixture model studied here had two point distributions and five continuous distributions. I applied the expectation-maximization algorithm to obtain the maximum likelihood estimates of the mixture model parameters for each individual. The fitted mixture model well described the properties of the distribution of the alternative allele proportions. The estimates of mixing proportions were used to estimate the genotype frequencies in the data. Each individual had different estimates of model parameters, but the estimates of genotype fractions of the six individuals were similar. The estimated fractions of the members from each trio were similar to each other. I next combined two approaches of clustering and mixture modeling to genotype the exomic base pair positions of an individual using next generation sequencing data. The alternative allele proportion at a position was used to measure the Bayesian posterior probability of single nucleotide polymorphism at a position. I developed software package named " SNVclust" to generate alternative allele proportions and genotypes of an individual. This software was used to make a call set of single nucleotide polymorphism positions and genotypes for each of three members of a trio from the 1000 Genomes Project. The results from this software were compared with the released single nucleotide polymorphisms in the 1000 Genomes Project and results from two other programs. Then I found that minimal average coverage greater than 43 should be to use SNVclust for whole exome sequencing data.
dcterms.available2017-09-18T23:49:50Z
dcterms.contributorFinch, Stephen J.en_US
dcterms.contributorWu, Songen_US
dcterms.contributorYoon, Seungtai.en_US
dcterms.contributorMendell, Nancy R.en_US
dcterms.creatorLihm, Jayon
dcterms.dateAccepted2017-09-18T23:49:50Z
dcterms.dateSubmitted2017-09-18T23:49:50Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent86 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/76029
dcterms.issued2013-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-18T23:49:50Z (GMT). No. of bitstreams: 1 Lihm_grad.sunysb_0771E_11604.pdf: 6294449 bytes, checksum: ea60aff6311c13eb812d4e8f7336a68e (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectclustering, genotyping, mixture modeling, next generation sequencing, single nucleotide polymorphism
dcterms.subjectStatistics
dcterms.titleMixture Modeling of Next Generation Sequencing Data and its Applications to Genotyping and Estimating Genotype Frequencies
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record