Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/78239
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.typeDissertation
dcterms.abstractAnomaly detection is an important problem that has been studied in a broad spectrum of research areas due to its diverse applications in different domains. There exist many anomaly detection algorithms, among them, some are domain specific and others are more generic. Despite a great amount of advance in this research area, there does not exist a single winner anomaly detector known to work well across different datasets. In fact, designing a single method that is effective on a wide range of domains is a challenging task. Moreover, real-world data consists of multiple and diverse input modalities. Each modality is characterized by very different properties which make it difficult to ignore their differences. This requires designing of a multimodal learning approach by fusing various modalities into a single combined representation. Ensemble techniques for classification and clustering have long proven effective, yet anomaly ensembles have been barely studied. In this dissertation, we tap into this gap and design new ensemble approaches for anomaly mining. Specifically, we design (i) an ensemble approach SELECT which employs novel techniques to systematically select the results from multiple anomaly detectors as well as consensus approaches to assemble, and (ii) a sequential ensemble approach CARE that employs a two-phase aggregation of the intermediate results of base detectors in each iteration to reach the final outcome by reducing both bias and variance. Both the approaches are fully unsupervised as ground truth is scarce in real-world data. We utilize SELECT for event detection in temporal graphs and both the ensemble approaches for outlier detection in multidimensional point data (no-graph). We further improve CARE and develop iCARE, a faster isolation based ensemble approach to be used for massive datasets. Although diverse learning approaches for anomaly mining have been studied for decades, designing multimodal learning approaches for anomaly mining has been researched more recently. In this line of recent works, a useful application of multimodal learning is in opinion spam detection for online review data. We design a new holistic approach called SpEagle that utilizes clues from all metadata (text, timestamp, rating) as well as relational data (review-network), and harness them collectively under a unified framework to spot suspicious users and reviews. Moreover, this method can seamlessly integrate semi-supervision by incorporating labels and achieve improved performance. Furthermore, we improve the SpEagle framework with active inference. We design a method called Expected UnCertainty Reach (EUCR) which is used at each step to pick a node having high uncertainty from a dense region and close to other uncertain nodes. We evaluate our ensembles and multimodal learning approaches on large-scale real-world datasets and they provide improved performance over the existing baselines and state-of-the-art anomaly mining approaches.
dcterms.available2018-06-21T13:38:40Z
dcterms.contributorAkoglu, Lemanen_US
dcterms.contributorRamakrishnan, I. V.en_US
dcterms.contributorNikiforakis, Nikolaosen_US
dcterms.contributorFodor, Paulen_US
dcterms.contributorChandola, Varunen_US
dcterms.creatorRayana, Shebuti
dcterms.dateAccepted2018-06-21T13:38:40Z
dcterms.dateSubmitted2018-06-21T13:38:40Z
dcterms.descriptionDepartment of Computer Scienceen_US
dcterms.extent167 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/78239
dcterms.issued2017-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2018-06-21T13:38:40Z (GMT). No. of bitstreams: 1 Rayana_grad.sunysb_0771E_13526.pdf: 8788748 bytes, checksum: 350fbe07a9aa5764f66bcd40c4b3f605 (MD5) Previous issue date: 12en
dcterms.subjectComputer science
dcterms.subjectanomaly mining
dcterms.subjectensemble learning
dcterms.subjectmultidimensional
dcterms.subjectmultimodal learning
dcterms.subjecttime series
dcterms.subjectunsupervised learning
dcterms.titleEnsemble and Multimodal Learning for Anomaly Mining: Algorithms and Applications
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record