Show simple item record

dc.identifier.urihttp://hdl.handle.net/1951/59607
dc.identifier.urihttp://hdl.handle.net/11401/71191
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractThe goal of the genome-wide association studies (GWAS) is to investigate the relationships between disease phenotypes and genotypes, which are usually determined by a large number of single nucleotide polymorphisms (SNPs). Currently GWAS are often underpowered to identify SNPs with small to moderate effect sizes. In order to overcome this difficulty, two major approaches, (1) meta-analysis by increasing sample size and (2) SNP pre-selection by dimension reduction, are often adopted. Dimension reduction for SNP data has been arduous due to the categorical nature of SNP that renders most association measures such as the Pearson correlation or the Euclidean distance inappropriate. In this thesis, we propose a novel (partial) canonical correlation association measure for categorical data that can be implemented to major dimension reduction approaches including: cluster analysis (CA) and partial correlation network analysis (PCNA) towards the analysis of GWAS data. Its performance is examined and comparison is made to other existing association measures. Network analysis methods such as PCNA and the Bayesian network serve as not only dimension reduction approaches but also data driven pathway discovery tools. A key objective in modern genetic studies is to discover the regulatory causal relationships between genetic mutations measured by SNPs and the resulting functional changes often gauged by gene expression levels. With the former being categorical and the latter continuous numerical data, we now face the problem of mixed data types. Our novel partial canonical correlation measure developed for categorical data can be readily extended to PCNA with mixed variables. This new approach is illustrated by using a real data example from a study on inflammatory bowel diseases conducted at Stony brook University Medical Center and the Washington University at St. Louis. Comparison is also made to Bayesian network analysis for mixed data and guidelines provided on the pros and cons of each method.
dcterms.available2013-05-22T17:34:18Z
dcterms.available2015-04-24T14:46:24Z
dcterms.contributorWu, Songen_US
dcterms.contributorZhu, Wei , Ahn, Hongshiken_US
dcterms.contributorLi, Ellen.en_US
dcterms.creatorChen, Hongyan
dcterms.dateAccepted2013-05-22T17:34:18Z
dcterms.dateAccepted2015-04-24T14:46:24Z
dcterms.dateSubmitted2013-05-22T17:34:18Z
dcterms.dateSubmitted2015-04-24T14:46:24Z
dcterms.descriptionDepartment of Applied Mathematics and Statisticsen_US
dcterms.extent102 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/1951/59607
dcterms.identifierChen_grad.sunysb_0771E_10753en_US
dcterms.identifierhttp://hdl.handle.net/11401/71191
dcterms.issued2011-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2013-05-22T17:34:18Z (GMT). No. of bitstreams: 1 Chen_grad.sunysb_0771E_10753.pdf: 802641 bytes, checksum: d73da7cee46e3e72634fa0cb5e519832 (MD5) Previous issue date: 1en
dcterms.provenanceMade available in DSpace on 2015-04-24T14:46:24Z (GMT). No. of bitstreams: 3 Chen_grad.sunysb_0771E_10777.pdf.jpg: 3187 bytes, checksum: 17066d4449db985bb006951a280eccb1 (MD5) Chen_grad.sunysb_0771E_10777.pdf.txt: 168428 bytes, checksum: 32f2e7439052f5f449abc706298c3700 (MD5) Chen_grad.sunysb_0771E_10777.pdf: 2064423 bytes, checksum: d0f83eff2583d8011444d2e0508f1a3b (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectCanonical correlation, Clustering analysis, Network analysis, Pearson residuals, SNP
dcterms.subjectStatistics--Biostatistics
dcterms.titleClustering and Network Analysis with Single Nucleotide Polymorphism (SNP)
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record