dc.identifier.uri | http://hdl.handle.net/11401/77665 | |
dc.description.sponsorship | This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. | en_US |
dc.format | Monograph | |
dc.format.medium | Electronic Resource | en_US |
dc.language.iso | en_US | |
dc.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dc.type | Dissertation | |
dcterms.abstract | Genome-wide association studies (GWA studies) are an important tool for identifying disease susceptibility variants for common and complex diseases. Traditional approaches to data analysis in GWA studies suffer with the multiple testing problem and also ignore any potential relationships between gene variants. We introduced here a novel two-stage framework with the combination of partial correlation network analysis (PCNA) and data mining techniques. This network-based technique, focusing on SNPs in joint modeling and their partial associations, alleviated the multiple testing problem and consequently increased the power to detect biologically relevant variants and their associations. Variable selection was achieved through penalized logistic regression using sparse-group lasso (SGL) penalty by grouping SNPs based on their: 1) pairwise canonical correlation measurement; or 2) biological information such as gene mapping. Network construction was based on pairwise partial correlation coefficients. Simulation studies have indicated that this two-stage approach achieved high accuracy and a low false-positive rate in the identification of known individual and two-way association targets, which elucidated that it is possible to recover the true direct relationship even for high-dimensional situation. Subsequently, we illustrated the proposed approach in a search for potential significant SNP-SNP/gene-gene associations with nicotine dependence using a real data example from a GWA study conducted by the Washington University at St. Louis. The result would provide researchers potentially biologically relevant genetic networks for further investigation. Another contribution of this thesis is the exploration of miRNA-mRNA regulatory set associated with essential thrombocytosis (ET) through the introduction of an application of penalized technique to canonical correlation analysis on microarray data sets. The identified variables were successfully tested by leave-one-out cross validation and a network exploration system. | |
dcterms.available | 2017-09-20T16:53:15Z | |
dcterms.contributor | Wang, Xuefeng | en_US |
dcterms.contributor | Zhu, Wei | en_US |
dcterms.contributor | Bahou, Wadie. | en_US |
dcterms.contributor | Wu, Song | en_US |
dcterms.creator | Huang, Erya | |
dcterms.dateAccepted | 2017-09-20T16:53:15Z | |
dcterms.dateSubmitted | 2017-09-20T16:53:15Z | |
dcterms.description | Department of Applied Mathematics and Statistics. | en_US |
dcterms.extent | 138 pg. | en_US |
dcterms.format | Monograph | |
dcterms.format | Application/PDF | en_US |
dcterms.identifier | http://hdl.handle.net/11401/77665 | |
dcterms.issued | 2015-12-01 | |
dcterms.language | en_US | |
dcterms.provenance | Made available in DSpace on 2017-09-20T16:53:15Z (GMT). No. of bitstreams: 1
Huang_grad.sunysb_0771E_12368.pdf: 2689779 bytes, checksum: aa413c60e99433565ad20ab7fb46b2d2 (MD5)
Previous issue date: 1 | en |
dcterms.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dcterms.subject | Statistics | |
dcterms.title | Statistical Methods for Association Analysis of Biological Data | |
dcterms.type | Dissertation | |