Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77577
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractIn this dissertation, we propose a new classification ensemble method named Canonical Forest. This new ensemble method uses canonical linear discriminant analysis (CLDA) and bootstrap resampling method to create more accurate and diverse classifiers in an ensemble. Although CLDA is commonly used for dimension reduction, we note here CLDA serves as a linear transformation tool rather than a dimension reduction tool. Since CLDA will find the transformed space that separates the classes farther in distribution, classifiers built on this space will be more accurate than those on the original space. To further diversify the classifiers in an ensemble, CLDA is applied only on a partial mutually exclusive feature space for each bootstrap sample. To compare the performance of Canonical Forest and other widely used ensemble methods including Bagging, Adaboost, Samme, Random Forest, and Rotation Forest, we tested them on 29 real or artificial data sets. In addition to the classification accuracy, we also investigated the diversity and the bias and variance decomposition of each ensemble method. Because Canonical Forest cannot be applied to high-dimensional data directly, we propose another version of Canonical Forest called High-Dimensional Canonical Forest (HDCF) that is specifically designed for the high-dimensional data. By implementing the algorithm of Random Subspace into Canonical Forest, we can naturally apply Canonical Forest to high-dimensional data without performing feature selection or feature reduction first. We compared the performance of HDCF with some current popular high-dimensional classification algorithms including SVM, CERP, and Random Forest using gene imprinting, estrogen and leukemia data sets.
dcterms.available2017-09-20T16:52:56Z
dcterms.contributorAhn, Hongshiken_US
dcterms.contributorZhu, Weien_US
dcterms.contributorWu, Songen_US
dcterms.contributorZhou, Yiyi.en_US
dcterms.creatorChen, Yu-Chuan
dcterms.dateAccepted2017-09-20T16:52:56Z
dcterms.dateSubmitted2017-09-20T16:52:56Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent122 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/77577
dcterms.issued2014-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:56Z (GMT). No. of bitstreams: 1 Chen_grad.sunysb_0771E_11708.pdf: 3615276 bytes, checksum: 64ad1fb127ffa875be9feac3149a0cf2 (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectCanonical linear discriminant analysis, Classification, Ensemble, Linear discriminant analysis, Rotation Forest
dcterms.subjectStatistics
dcterms.titleCanonical Forest
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record