Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/78213
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degreeen_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.typeDissertation
dcterms.abstractNext generation sequencing (NGS) technology provides an attractive platform for genomic study. RNA-seq employs NGS technology to sequence and quantify RNA content in samples and reveal their gene expression profiles. In RNA-seq studies, one important objective is to identify the gene expression difference between two experimental conditions (e.g. control vs. treatment), which is known as differential expression (DE) analysis. Various statistical methods, such as edgeR and DESeq, have been developed to perform the two-sample DE analysis. However, in practice, expression data may come in pairs, e.g., pre-vs. post-treatment on the same individual, and new models incorporating this paired structure are in great demand. In this thesis, we propose a new analysis framework that directly takes into account the paired structure of RNA-seq data and perform the paired DE analysis. Normalization is a crucial pre-processing step for DE analysis. However, none of the currently available normalization methods are designed for paired RNA-seq data. We investigated all existing normalization methods through a series of simulation studies to gain insights about their applicability. Based on these, a customized normalization method (pairedNorm) has been proposed for paired RNA-seq DE analysis. Regarding the statistical test, we adopt the Poisson model for the paired RNA-seq data and propose a conditional likelihood framework, named as pairedBN, for parameter estimation and hypothesis testing. Unlike the other DE tests, the proposed method does not assume distribution of baseline expression level across samples and has no restriction on proportion of DE genes within a sample. The conditional likelihood framework is employed to reduce the nuisance parameters, e.g., the sample specific true expression levels, thus largely improving the computational efficiency. Furthermore, a non-parametric test procedure can serve as an ad-hoc procedure allowing for more flexibility of the data. We conduct an extensive comparison of our method (pairedBN) with two most popular methods, edgeR and DESeq, through simulation studies. The results show the superiority of pairedBN in FDR control while maintaining good sensitivity. We also apply our method to analyze a paired RNA-seq data from TCGA to demonstrate its practical usage.
dcterms.available2018-03-22T22:39:19Z
dcterms.contributorZhu, Weien_US
dcterms.contributorWu, Song.en_US
dcterms.contributorYang, Jieen_US
dcterms.contributorBahou, Wadie.en_US
dcterms.creatorXu, Jianjin
dcterms.dateAccepted2018-03-22T22:39:19Z
dcterms.dateSubmitted2018-03-22T22:39:19Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent105 pg.en_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierhttp://hdl.handle.net/11401/78213
dcterms.issued2017-08-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2018-03-22T22:39:19Z (GMT). No. of bitstreams: 1 Xu_grad.sunysb_0771E_13396.pdf: 1331944 bytes, checksum: b8cfa8dee63c49db15631775a3eb24f2 (MD5) Previous issue date: 2017-08-01en
dcterms.subjectdifferential expression
dcterms.subjectBiostatistics
dcterms.subjectpaired data
dcterms.subjectPoisson distribution
dcterms.subjectRNA-seq
dcterms.titleA Conditional Likelihood Based Model for Differential Expression Analysis for Paired RNA-seq Data
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record