Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77665
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractGenome-wide association studies (GWA studies) are an important tool for identifying disease susceptibility variants for common and complex diseases. Traditional approaches to data analysis in GWA studies suffer with the multiple testing problem and also ignore any potential relationships between gene variants. We introduced here a novel two-stage framework with the combination of partial correlation network analysis (PCNA) and data mining techniques. This network-based technique, focusing on SNPs in joint modeling and their partial associations, alleviated the multiple testing problem and consequently increased the power to detect biologically relevant variants and their associations. Variable selection was achieved through penalized logistic regression using sparse-group lasso (SGL) penalty by grouping SNPs based on their: 1) pairwise canonical correlation measurement; or 2) biological information such as gene mapping. Network construction was based on pairwise partial correlation coefficients. Simulation studies have indicated that this two-stage approach achieved high accuracy and a low false-positive rate in the identification of known individual and two-way association targets, which elucidated that it is possible to recover the true direct relationship even for high-dimensional situation. Subsequently, we illustrated the proposed approach in a search for potential significant SNP-SNP/gene-gene associations with nicotine dependence using a real data example from a GWA study conducted by the Washington University at St. Louis. The result would provide researchers potentially biologically relevant genetic networks for further investigation. Another contribution of this thesis is the exploration of miRNA-mRNA regulatory set associated with essential thrombocytosis (ET) through the introduction of an application of penalized technique to canonical correlation analysis on microarray data sets. The identified variables were successfully tested by leave-one-out cross validation and a network exploration system.
dcterms.available2017-09-20T16:53:15Z
dcterms.contributorWang, Xuefengen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorBahou, Wadie.en_US
dcterms.contributorWu, Songen_US
dcterms.creatorHuang, Erya
dcterms.dateAccepted2017-09-20T16:53:15Z
dcterms.dateSubmitted2017-09-20T16:53:15Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent138 pg.en_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierhttp://hdl.handle.net/11401/77665
dcterms.issued2015-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:53:15Z (GMT). No. of bitstreams: 1 Huang_grad.sunysb_0771E_12368.pdf: 2689779 bytes, checksum: aa413c60e99433565ad20ab7fb46b2d2 (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics
dcterms.titleStatistical Methods for Association Analysis of Biological Data
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record