Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77826
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractOne of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html‎ . Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html‎ .
dcterms.available2017-09-26T17:07:20Z
dcterms.contributorKrasnitz, Alexanderen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorFinch, Stephenen_US
dcterms.contributorYoon, Seungtai.en_US
dcterms.creatorSun, Guoli
dcterms.dateAccepted2017-09-26T17:07:20Z
dcterms.dateSubmitted2017-09-26T17:07:20Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent91 pg.en_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierhttp://hdl.handle.net/11401/77826
dcterms.identifierSun_grad.sunysb_0771E_12159.pdfen_US
dcterms.issued2014-05-01
dcterms.languageen_US
dcterms.provenanceSubmitted by Jason Torre (fjason.torre@stonybrook.edu) on 2017-09-26T17:07:20Z No. of bitstreams: 1 Sun_grad.sunysb_0771E_12159.pdf: 3450079 bytes, checksum: 472dd387d963b4ebb6595bd9158cf819 (MD5)en
dcterms.provenanceMade available in DSpace on 2017-09-26T17:07:20Z (GMT). No. of bitstreams: 1 Sun_grad.sunysb_0771E_12159.pdf: 3450079 bytes, checksum: 472dd387d963b4ebb6595bd9158cf819 (MD5) Previous issue date: 2014-05-01en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics
dcterms.subjectClustering, Hierarchical, Randomizations, TBEST
dcterms.titleSignificant distinct branches of hierarchical trees: A framework for statistical analysis and applications to biological data
dcterms.typeDissertation


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record