Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77221
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractNext generation sequencing (NGS) technology has been widely used in biomedical research, particularly on those genomics-related studies. One of the NGS applications is high-throughput mRNA sequencing (RNA-seq), which is usually applied to discover alternative splicing events, to evaluate gene expression level and to identify differentially expressed genes. Compared with the traditional microarrays, RNA-seq is more efficient and economical. Currently, many useful software tools have been developed for RNA-seq differential expression (DE) analyses, such as edgeR, DESeq and Cufflinks; however, all these methods either ignore the isoforms of mRNA transcript, or rely on the predefined isoform structures, or depend on the De Novo isoform reconstruction from the sequencing data, which lead to less accurate inference. In this thesis, we developed and implemented a novel splicing-graph based negative binomial (SGNB) model for gene differential expression analysis in RNA-seq data. The principle of our model is to change the expression comparisons from the unobservable transcript level to the observable read type level, according to the fundamental theory of the linear algebra. The likelihood ratio test is used for finding DE genes. Computationally, we employed the expectation-maximization (EM) and the Newton-Raphson algorithms for parameter estimation. The main advantage of our model is that it considers the isoform but does not require the pre-defined isoform structure and therefore is expected to be more robust and powerful. At the same time, our method does not ask for the De Novo procedure, which will save the time and avoid errors in reconstructing isoforms. We performed intensive simulations to compare our new method with one of the most popular package, edgeR. Under various scenarios we examined, the results showed that our new model can achieve better power, while correctly controlling the false discovery rate. We also applied our method to a real data set to demonstrate its applicability in practice.
dcterms.available2017-09-20T16:52:13Z
dcterms.contributorYang, Jieen_US
dcterms.contributorWu, Songen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorGalambos, Nora.en_US
dcterms.creatorLiu, Yang
dcterms.dateAccepted2017-09-20T16:52:13Z
dcterms.dateSubmitted2017-09-20T16:52:13Z
dcterms.descriptionDepartment of Applied Mathematics and Statisticsen_US
dcterms.extent90 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/77221
dcterms.issued2016-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:13Z (GMT). No. of bitstreams: 1 Liu_grad.sunysb_0771E_12980.pdf: 3550259 bytes, checksum: ba4a6b6f990fa92c75dc4f6c27f9e7fd (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics
dcterms.titleAn Isoform-free Model for Differential Expression Analysis in RNA-seq Data
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record