Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/76393
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractRecent development of ultra-high-throughput sequencing of the transcriptome (mRNA-Seq) provides a means of profiling RNA splicing events at unprecedented depth. On the other hand, the ultra-high coverage and the complexity brought by mRNA-Seq data also create big challenges for computational analysis. My Ph.D. work focuses on developing algorithms to detect, quantify and characterize alternative splicing (AS) from mRNA-Seq data. These algorithms include: (1) OLego, a fast and sensitive splice mapping program for mRNA-Seq data. The most important features of OLego include strategic and efficient searches with very small seeds (12~14 nt), and a built-in regression model to score exon junctions. In addition, OLego does not require any external mapper, and is implemented in C++ with full support of multithreading. As a consequence, OLego has improved sensitivity on junction and exon discovery while keeping high accuracy and speed. (2) In-house scripts to identify AS events from alignment results of mRNA-Seq data. Instead of constructing full structures of the transcripts, this approach identifies exons and AS events from the junction reads directly to achieve lower complexity and higher sensitivity of splicing events. (3) SpliceTrap, a method to quantify exon inclusion ratios from paired end mRNA-Seq data using a Bayesian model. The algorithm solves the splicing problem by looking at local splicing events instead of the whole transcripts, which enables quantification of exon inclusion ratios without knowing the complete transcript structure. It also utilizes prior information including fragment size distribution and inclusion ratio models from highly covered AS events. All of the programs above are splicing-centric tools and can be used to study AS events with high resolution and sensitivity. We have applied this pipeline on many real dataset including the BodyMap 2.0 data, in which we identified 120,110 cassette exons in human genome, including 82,528 novel cassette exon events. Strikingly, we identified over 2,000 cassette micro-exons smaller than 27 nt, 105 of them have a length of 6 nt. Because of the minimal information that can be possibly encoded in this set of exons, they serve as an excellent model to study their functional significance and mechanism of AS regulation.
dcterms.available2017-09-20T16:50:09Z
dcterms.contributorXing, Haipengen_US
dcterms.contributorZhang, Michael Qen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorKrainer, Adrian.en_US
dcterms.creatorWu, Jie
dcterms.dateAccepted2017-09-20T16:50:09Z
dcterms.dateSubmitted2017-09-20T16:50:09Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent137 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/76393
dcterms.issued2015-08-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:50:09Z (GMT). No. of bitstreams: 1 Wu_grad.sunysb_0771E_11546.pdf: 2794288 bytes, checksum: 1f00de9d03788e1daa850d04df3ba5ab (MD5) Previous issue date: 2013en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectAlternative splicing, Next generation sequencing, RNA-Seq
dcterms.subjectBioinformatics
dcterms.titleNovel Computational Methodology for Detecting and Quantifying Alternative Splicing from RNA-Seq data
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record