Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77177
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractThe purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs.
dcterms.abstractThe purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs.
dcterms.available2017-09-20T16:52:09Z
dcterms.contributorFinch, Stephenen_US
dcterms.contributorAhn, Hongshiken_US
dcterms.contributorXing, Haipengen_US
dcterms.contributorHong, Sangjin.en_US
dcterms.creatorLee, Hyejoo
dcterms.dateAccepted2017-09-20T16:52:09Z
dcterms.dateSubmitted2017-09-20T16:52:09Z
dcterms.descriptionDepartment of Applied Mathematics and Statisticsen_US
dcterms.extent135 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/77177
dcterms.issued2016-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:09Z (GMT). No. of bitstreams: 1 Lee_grad.sunysb_0771E_12804.pdf: 910298 bytes, checksum: d5f8a74a3bf3f9ed24422ab0fb33f3ff (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectStatistics
dcterms.titleClustering and Classification Methods for Prediction of the risk for Developing Disease
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record