dc.identifier.uri	http://hdl.handle.net/1951/55456
dc.identifier.uri	http://hdl.handle.net/11401/70926
dc.description.sponsorship	This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.	en_US
dc.format	Monograph
dc.format.medium	Electronic Resource	en_US
dc.language.iso	en_US
dc.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dc.type	Thesis
dcterms.abstract	Wikipedia describes itself as"The free encyclopedia that anyone canedit". Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Wikipedia editors fight vandalism both manually and with automated bots that use regular expressions and other simple rules to recognize malicious edits[Carter]. Researchers have also proposed Machine Learning algorithms for vandalism detection[Smets et al., 2008; Potthast et al., 2008a], but these algorithms are still in their infancy and have much room for improvement. This paper presents an approach to fighting vandalism using natural language processing and machine learning techniques. Along with basic features of the edit like edit distance, edit type, count of abnormal patterns and slang words, we use features related to information about the editor, past revision history of the article, change in sentiment of the article and PCFG sentence parser score. We have successfully been able to achieve an area under the ROC curve (AUC) of 0.94 and F1 score of 0.53 using LogitBoost in a 10 cross validation setting on a training set [Potthast, 2010] of 32444 human annotated edits. We also analyze the performance of our features by building separate classifier for insert or changes, deletes and template edits in a balanced and unbalanced corpus setting.
dcterms.available	2012-05-15T18:03:57Z
dcterms.available	2015-04-24T14:45:08Z
dcterms.contributor	Johnson, Rob	en_US
dcterms.contributor	Yejin Choi.	en_US
dcterms.contributor	Skiena, Steve	en_US
dcterms.creator	Harpalani, Manoj
dcterms.dateAccepted	2012-05-15T18:03:57Z
dcterms.dateAccepted	2015-04-24T14:45:08Z
dcterms.dateSubmitted	2012-05-15T18:03:57Z
dcterms.dateSubmitted	2015-04-24T14:45:08Z
dcterms.description	Department of Computer Science	en_US
dcterms.format	Monograph
dcterms.format	Application/PDF	en_US
dcterms.identifier	Harpalani_grad.sunysb_0771M_10348.pdf	en_US
dcterms.identifier	http://hdl.handle.net/1951/55456
dcterms.identifier	http://hdl.handle.net/11401/70926
dcterms.issued	2010-12-01
dcterms.language	en_US
dcterms.provenance	Made available in DSpace on 2012-05-15T18:03:57Z (GMT). No. of bitstreams: 1 Harpalani_grad.sunysb_0771M_10348.pdf: 600806 bytes, checksum: 8a3670cb1c3594ad2cf32edebc57fe7f (MD5) Previous issue date: 1	en
dcterms.provenance	Made available in DSpace on 2015-04-24T14:45:08Z (GMT). No. of bitstreams: 6 Harpalani_grad.sunysb_0771M_10348.pdf.jpg: 1894 bytes, checksum: a6009c46e6ec8251b348085684cba80d (MD5) thesis-preso.pdf.jpg: 1625 bytes, checksum: d03feb4d69682b81d65b8ff8dfa51b53 (MD5) Harpalani_grad.sunysb_0771M_10348.pdf.txt: 59010 bytes, checksum: 4a6bc300c1663cc2caf610f6bcc5acc6 (MD5) thesis-preso.pdf.txt: 6931 bytes, checksum: 8c05f40a0919a62e0d54427efd82ff6a (MD5) Harpalani_grad.sunysb_0771M_10348.pdf: 600806 bytes, checksum: 8a3670cb1c3594ad2cf32edebc57fe7f (MD5) thesis-preso.pdf: 1864614 bytes, checksum: f66e270c1fd3cf60cb45b412aacf17c2 (MD5) Previous issue date: 1	en
dcterms.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subject	Machine Learning, Natural Language Processing, Vandalism Detection, Wikipedia, Wikipedia Vandalism Detection, Wiki Vandalysis
dcterms.subject	Computer Science
dcterms.title	Wiki Vandalysis. Wikipedia Vandalism Analysis
dcterms.type	Thesis

Files in this item

Name:: Harpalani_grad.sunysb_0771M_10 ...
Size:: 586.7Kb
Format:: application/pdf

View/Open

Name:: thesis-preso.pdf
Size:: 1.778Mb
Format:: application/pdf
Description:: Presentation

View/Open

This item appears in the following Collection(s)

Stony Brook Theses and Dissertations Collection [4009]

Show simple item record

Wiki Vandalysis. Wikipedia Vandalism Analysis

Files in this item

This item appears in the following Collection(s)