dc.identifier.uri	http://hdl.handle.net/11401/77177
dc.description.sponsorship	This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.	en_US
dc.format	Monograph
dc.format.medium	Electronic Resource	en_US
dc.language.iso	en_US
dc.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dc.type	Dissertation
dcterms.abstract	The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individualâ€™s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who donâ€™t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs.
dcterms.abstract	The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs.
dcterms.available	2017-09-20T16:52:09Z
dcterms.contributor	Finch, Stephen	en_US
dcterms.contributor	Ahn, Hongshik	en_US
dcterms.contributor	Xing, Haipeng	en_US
dcterms.contributor	Hong, Sangjin.	en_US
dcterms.creator	Lee, Hyejoo
dcterms.dateAccepted	2017-09-20T16:52:09Z
dcterms.dateSubmitted	2017-09-20T16:52:09Z
dcterms.description	Department of Applied Mathematics and Statistics	en_US
dcterms.extent	135 pg.	en_US
dcterms.format	Application/PDF	en_US
dcterms.format	Monograph
dcterms.identifier	http://hdl.handle.net/11401/77177
dcterms.issued	2016-12-01
dcterms.language	en_US
dcterms.provenance	Made available in DSpace on 2017-09-20T16:52:09Z (GMT). No. of bitstreams: 1 Lee_grad.sunysb_0771E_12804.pdf: 910298 bytes, checksum: d5f8a74a3bf3f9ed24422ab0fb33f3ff (MD5) Previous issue date: 1	en
dcterms.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subject	Statistics
dcterms.title	Clustering and Classification Methods for Prediction of the risk for Developing Disease
dcterms.type	Dissertation

Files in this item

Name:: Lee_grad.sunysb_0771E_12804.pdf
Size:: 888.9Kb
Format:: application/pdf

View/Open

This item appears in the following Collection(s)

Stony Brook Theses and Dissertations Collection [4009]

Show simple item record

Clustering and Classification Methods for Prediction of the risk for Developing Disease

Files in this item

This item appears in the following Collection(s)