Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77477
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractClassification algorithms that optimize the overall accuracy or class distribution purity often suffer from difficulties in classifying class imbalanced data, in which most cases in the testing set will be classified to the majority class. However for imbalanced data classification, one usually cares more about the accuracy for identifying the minority class (e.g. diseased samples), that is, the sensitivity, other than the overall accuracy and therefore low sensitivity is highly undesirable. Receiver operating characteristic (ROC) is a 2 dimensional graph by plotting sensitivity versus specificity, i.e., accuracy in identifying the majority class (e.g. normal samples). A curve is formed by varying the decision threshold and the area under ROC (AUC) is employed as an accuracy measurement to evaluate the performance of classification. Random Forest, a modern ensemble classifier, is gaining increasing attention in the community because of its good classification capability. Each single learner is a decision tree, built on a bagging data with each node split based on a randomly selected feature subset. As a result, each base learner is relatively " independent" to the others and thus the ensemble's classification accuracy improves overall. In this dissertation, we combine the ROC analysis and the Random Forest to establish the proposed ROC Random Forest algorithm. There are two goals to this algorithm: (1) improving the AUC value, and (2) producing balanced classification result. Verification was carried out using 18 public data sets from the UCI and the results show that the ROC Random Forest not only improves the classification accuracy in terms of higher AUC value but also delivers a more balanced classification result comparing to other Random Forest settings. One draw-back of the ROC Random Forest lies in its difficulty in processing categorical predictors. Given the importance of categorical predictors in many classification problems, we have further combined the ROC Random Forest with optimal node splitting algorithms other than ROC for categorical predictors. The resulting Hybrid ROC Random Forest is further evaluated on 8 UCI data sets.
dcterms.available2017-09-20T16:52:46Z
dcterms.contributorWu, Songen_US
dcterms.contributorZhu, Weien_US
dcterms.contributorGao, Yien_US
dcterms.contributorLi, Ellen.en_US
dcterms.creatorSong, Bowen
dcterms.dateAccepted2017-09-20T16:52:46Z
dcterms.dateSubmitted2017-09-20T16:52:46Z
dcterms.descriptionDepartment of Applied Mathematics and Statistics.en_US
dcterms.extent139 pg.en_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierhttp://hdl.handle.net/11401/77477
dcterms.issued2015-05-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:46Z (GMT). No. of bitstreams: 1 Song_grad.sunysb_0771E_12222.pdf: 2581679 bytes, checksum: 9bc61738410d361cce431a75030e339e (MD5) Previous issue date: 2015en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectclassification, random forest, ROC analysis, supervised learning
dcterms.subjectStatistics
dcterms.titleROC Random Forest and Its Application
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record