Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77274
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractWord Connection Networks are graphs recording linguistic connections, including both semantic and syntactic connections, between single words. Specific Word Connection Networks of smaller sizes are frequently used in our daily communications – we search for counterparts of words in another language when doing translations and we group words by their sentiment when express feelings. Word Connection Networks are usually consistent with each other, which makes it an interesting and challenging idea to construct integrated language resources with both inter-language and intra-language connections to handle natural language processing tasks in a multilingual environment. We propose to collect large-scale word-level linguistic resources from the web that reflect qualitatively different types of connections between words across major languages and integrate them into Word Connection Networks. Our data sources include translations from online machine translation systems, transliterations of entities across major languages, semantic relationships between words from human annotations, distributed word representations which captured both semantic and syntactic features out of raw text and quantified sentiment polarities from sentiment analysis researches / applications. These resources cover different aspects of language features and contribute to the completeness of Word Connection Networks; thus we have strong and versatile knowledge bases to handle generalized natural language processing tasks. Additionally, we do research on numbers, frequently appearing but usually being ignored in language tasks, to explore word-level features inside their existence. The core contributions of this thesis are deeper knowledge mining in Word Connection Networks and extensions to generate valuable resources for various natural language processing tasks. Implementation of Word Connection Networks allows quantifying expressive power of connections from difference sources in a specific task. We make each single connection in Word Connection Networks traceable and implement a propagation method for information transitivity inside the graph, which allows us to discover a high-confidence model of semantic or syntactic connections that does not currently exist. We prove that inter-language connections preserve good features on word level from more detailed intra-language connections. We successfully finished several natural language processing tasks using connections in Word Connection Networks and we have generated new resources, including high frequency sentiment lexicons for 136 major languages and transliterations of 69 languages, by applying graph algorithms on Word Connection Networks.
dcterms.abstractWord Connection Networks are graphs recording linguistic connections, including both semantic and syntactic connections, between single words. Specific Word Connection Networks of smaller sizes are frequently used in our daily communications – we search for counterparts of words in another language when doing translations and we group words by their sentiment when express feelings. Word Connection Networks are usually consistent with each other, which makes it an interesting and challenging idea to construct integrated language resources with both inter-language and intra-language connections to handle natural language processing tasks in a multilingual environment. We propose to collect large-scale word-level linguistic resources from the web that reflect qualitatively different types of connections between words across major languages and integrate them into Word Connection Networks. Our data sources include translations from online machine translation systems, transliterations of entities across major languages, semantic relationships between words from human annotations, distributed word representations which captured both semantic and syntactic features out of raw text and quantified sentiment polarities from sentiment analysis researches / applications. These resources cover different aspects of language features and contribute to the completeness of Word Connection Networks; thus we have strong and versatile knowledge bases to handle generalized natural language processing tasks. Additionally, we do research on numbers, frequently appearing but usually being ignored in language tasks, to explore word-level features inside their existence. The core contributions of this thesis are deeper knowledge mining in Word Connection Networks and extensions to generate valuable resources for various natural language processing tasks. Implementation of Word Connection Networks allows quantifying expressive power of connections from difference sources in a specific task. We make each single connection in Word Connection Networks traceable and implement a propagation method for information transitivity inside the graph, which allows us to discover a high-confidence model of semantic or syntactic connections that does not currently exist. We prove that inter-language connections preserve good features on word level from more detailed intra-language connections. We successfully finished several natural language processing tasks using connections in Word Connection Networks and we have generated new resources, including high frequency sentiment lexicons for 136 major languages and transliterations of 69 languages, by applying graph algorithms on Word Connection Networks.
dcterms.available2017-09-20T16:52:19Z
dcterms.contributorBalasubramanian, Niranjanen_US
dcterms.contributorSkiena, Stevenen_US
dcterms.contributorSchwartz, Andrewen_US
dcterms.contributorYun, Jiwon.en_US
dcterms.creatorChen, Yanqing
dcterms.dateAccepted2017-09-20T16:52:19Z
dcterms.dateSubmitted2017-09-20T16:52:19Z
dcterms.descriptionDepartment of Computer Science.en_US
dcterms.extent153 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/77274
dcterms.issued2015-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:19Z (GMT). No. of bitstreams: 1 Chen_grad.sunysb_0771E_12361.pdf: 7134557 bytes, checksum: d65ecaa975fb352790086cc2488d8567 (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectComputer science
dcterms.subjectgraph analysis, multilingual, natural language processing, sentiment analysis, transliteration, word level connections
dcterms.titleNatural Language Processing using Word Connection Networks
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record