dc.identifier.uri | http://hdl.handle.net/11401/77177 | |
dc.description.sponsorship | This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. | en_US |
dc.format | Monograph | |
dc.format.medium | Electronic Resource | en_US |
dc.language.iso | en_US | |
dc.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dc.type | Dissertation | |
dcterms.abstract | The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs. | |
dcterms.abstract | The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs. | |
dcterms.available | 2017-09-20T16:52:09Z | |
dcterms.contributor | Finch, Stephen | en_US |
dcterms.contributor | Ahn, Hongshik | en_US |
dcterms.contributor | Xing, Haipeng | en_US |
dcterms.contributor | Hong, Sangjin. | en_US |
dcterms.creator | Lee, Hyejoo | |
dcterms.dateAccepted | 2017-09-20T16:52:09Z | |
dcterms.dateSubmitted | 2017-09-20T16:52:09Z | |
dcterms.description | Department of Applied Mathematics and Statistics | en_US |
dcterms.extent | 135 pg. | en_US |
dcterms.format | Application/PDF | en_US |
dcterms.format | Monograph | |
dcterms.identifier | http://hdl.handle.net/11401/77177 | |
dcterms.issued | 2016-12-01 | |
dcterms.language | en_US | |
dcterms.provenance | Made available in DSpace on 2017-09-20T16:52:09Z (GMT). No. of bitstreams: 1
Lee_grad.sunysb_0771E_12804.pdf: 910298 bytes, checksum: d5f8a74a3bf3f9ed24422ab0fb33f3ff (MD5)
Previous issue date: 1 | en |
dcterms.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dcterms.subject | Statistics | |
dcterms.title | Clustering and Classification Methods for Prediction of the risk for Developing Disease | |
dcterms.type | Dissertation | |