dc.identifier.uri	http://hdl.handle.net/11401/77477
dc.description.sponsorship	This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.	en_US
dc.format	Monograph
dc.format.medium	Electronic Resource	en_US
dc.language.iso	en_US
dc.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dc.type	Dissertation
dcterms.abstract	Classification algorithms that optimize the overall accuracy or class distribution purity often suffer from difficulties in classifying class imbalanced data, in which most cases in the testing set will be classified to the majority class. However for imbalanced data classification, one usually cares more about the accuracy for identifying the minority class (e.g. diseased samples), that is, the sensitivity, other than the overall accuracy and therefore low sensitivity is highly undesirable. Receiver operating characteristic (ROC) is a 2 dimensional graph by plotting sensitivity versus specificity, i.e., accuracy in identifying the majority class (e.g. normal samples). A curve is formed by varying the decision threshold and the area under ROC (AUC) is employed as an accuracy measurement to evaluate the performance of classification. Random Forest, a modern ensemble classifier, is gaining increasing attention in the community because of its good classification capability. Each single learner is a decision tree, built on a bagging data with each node split based on a randomly selected feature subset. As a result, each base learner is relatively " independent" to the others and thus the ensemble's classification accuracy improves overall. In this dissertation, we combine the ROC analysis and the Random Forest to establish the proposed ROC Random Forest algorithm. There are two goals to this algorithm: (1) improving the AUC value, and (2) producing balanced classification result. Verification was carried out using 18 public data sets from the UCI and the results show that the ROC Random Forest not only improves the classification accuracy in terms of higher AUC value but also delivers a more balanced classification result comparing to other Random Forest settings. One draw-back of the ROC Random Forest lies in its difficulty in processing categorical predictors. Given the importance of categorical predictors in many classification problems, we have further combined the ROC Random Forest with optimal node splitting algorithms other than ROC for categorical predictors. The resulting Hybrid ROC Random Forest is further evaluated on 8 UCI data sets.
dcterms.available	2017-09-20T16:52:46Z
dcterms.contributor	Wu, Song	en_US
dcterms.contributor	Zhu, Wei	en_US
dcterms.contributor	Gao, Yi	en_US
dcterms.contributor	Li, Ellen.	en_US
dcterms.creator	Song, Bowen
dcterms.dateAccepted	2017-09-20T16:52:46Z
dcterms.dateSubmitted	2017-09-20T16:52:46Z
dcterms.description	Department of Applied Mathematics and Statistics.	en_US
dcterms.extent	139 pg.	en_US
dcterms.format	Monograph
dcterms.format	Application/PDF	en_US
dcterms.identifier	http://hdl.handle.net/11401/77477
dcterms.issued	2015-05-01
dcterms.language	en_US
dcterms.provenance	Made available in DSpace on 2017-09-20T16:52:46Z (GMT). No. of bitstreams: 1 Song_grad.sunysb_0771E_12222.pdf: 2581679 bytes, checksum: 9bc61738410d361cce431a75030e339e (MD5) Previous issue date: 2015	en
dcterms.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subject	classification, random forest, ROC analysis, supervised learning
dcterms.subject	Statistics
dcterms.title	ROC Random Forest and Its Application
dcterms.type	Dissertation

Files in this item

Name:: Song_grad.sunysb_0771E_12222.pdf
Size:: 2.462Mb
Format:: application/pdf

View/Open

This item appears in the following Collection(s)

Stony Brook Theses and Dissertations Collection [4009]

Show simple item record

ROC Random Forest and Its Application

Files in this item

This item appears in the following Collection(s)