dc.identifier.uri | http://hdl.handle.net/11401/77274 | |
dc.description.sponsorship | This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. | en_US |
dc.format | Monograph | |
dc.format.medium | Electronic Resource | en_US |
dc.language.iso | en_US | |
dc.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dc.type | Dissertation | |
dcterms.abstract | Word Connection Networks are graphs recording linguistic connections, including both semantic and syntactic connections, between single words. Specific Word Connection Networks of smaller sizes are frequently used in our daily communications – we search for counterparts of words in another language when doing translations and we group words by their sentiment when express feelings. Word Connection Networks are usually consistent with each other, which makes it an interesting and challenging idea to construct integrated language resources with both inter-language and intra-language connections to handle natural language processing tasks in a multilingual environment. We propose to collect large-scale word-level linguistic resources from the web that reflect qualitatively different types of connections between words across major languages and integrate them into Word Connection Networks. Our data sources include translations from online machine translation systems, transliterations of entities across major languages, semantic relationships between words from human annotations, distributed word representations which captured both semantic and syntactic features out of raw text and quantified sentiment polarities from sentiment analysis researches / applications. These resources cover different aspects of language features and contribute to the completeness of Word Connection Networks; thus we have strong and versatile knowledge bases to handle generalized natural language processing tasks. Additionally, we do research on numbers, frequently appearing but usually being ignored in language tasks, to explore word-level features inside their existence. The core contributions of this thesis are deeper knowledge mining in Word Connection Networks and extensions to generate valuable resources for various natural language processing tasks. Implementation of Word Connection Networks allows quantifying expressive power of connections from difference sources in a specific task. We make each single connection in Word Connection Networks traceable and implement a propagation method for information transitivity inside the graph, which allows us to discover a high-confidence model of semantic or syntactic connections that does not currently exist. We prove that inter-language connections preserve good features on word level from more detailed intra-language connections. We successfully finished several natural language processing tasks using connections in Word Connection Networks and we have generated new resources, including high frequency sentiment lexicons for 136 major languages and transliterations of 69 languages, by applying graph algorithms on Word Connection Networks. | |
dcterms.abstract | Word Connection Networks are graphs recording linguistic connections, including both semantic and syntactic connections, between single words. Specific Word Connection Networks of smaller sizes are frequently used in our daily communications – we search for counterparts of words in another language when doing translations and we group words by their sentiment when express feelings. Word Connection Networks are usually consistent with each other, which makes it an interesting and challenging idea to construct integrated language resources with both inter-language and intra-language connections to handle natural language processing tasks in a multilingual environment. We propose to collect large-scale word-level linguistic resources from the web that reflect qualitatively different types of connections between words across major languages and integrate them into Word Connection Networks. Our data sources include translations from online machine translation systems, transliterations of entities across major languages, semantic relationships between words from human annotations, distributed word representations which captured both semantic and syntactic features out of raw text and quantified sentiment polarities from sentiment analysis researches / applications. These resources cover different aspects of language features and contribute to the completeness of Word Connection Networks; thus we have strong and versatile knowledge bases to handle generalized natural language processing tasks. Additionally, we do research on numbers, frequently appearing but usually being ignored in language tasks, to explore word-level features inside their existence. The core contributions of this thesis are deeper knowledge mining in Word Connection Networks and extensions to generate valuable resources for various natural language processing tasks. Implementation of Word Connection Networks allows quantifying expressive power of connections from difference sources in a specific task. We make each single connection in Word Connection Networks traceable and implement a propagation method for information transitivity inside the graph, which allows us to discover a high-confidence model of semantic or syntactic connections that does not currently exist. We prove that inter-language connections preserve good features on word level from more detailed intra-language connections. We successfully finished several natural language processing tasks using connections in Word Connection Networks and we have generated new resources, including high frequency sentiment lexicons for 136 major languages and transliterations of 69 languages, by applying graph algorithms on Word Connection Networks. | |
dcterms.available | 2017-09-20T16:52:19Z | |
dcterms.contributor | Balasubramanian, Niranjan | en_US |
dcterms.contributor | Skiena, Steven | en_US |
dcterms.contributor | Schwartz, Andrew | en_US |
dcterms.contributor | Yun, Jiwon. | en_US |
dcterms.creator | Chen, Yanqing | |
dcterms.dateAccepted | 2017-09-20T16:52:19Z | |
dcterms.dateSubmitted | 2017-09-20T16:52:19Z | |
dcterms.description | Department of Computer Science. | en_US |
dcterms.extent | 153 pg. | en_US |
dcterms.format | Application/PDF | en_US |
dcterms.format | Monograph | |
dcterms.identifier | http://hdl.handle.net/11401/77274 | |
dcterms.issued | 2015-12-01 | |
dcterms.language | en_US | |
dcterms.provenance | Made available in DSpace on 2017-09-20T16:52:19Z (GMT). No. of bitstreams: 1
Chen_grad.sunysb_0771E_12361.pdf: 7134557 bytes, checksum: d65ecaa975fb352790086cc2488d8567 (MD5)
Previous issue date: 1 | en |
dcterms.publisher | The Graduate School, Stony Brook University: Stony Brook, NY. | |
dcterms.subject | Computer science | |
dcterms.subject | graph analysis, multilingual, natural language processing, sentiment analysis, transliteration, word level connections | |
dcterms.title | Natural Language Processing using Word Connection Networks | |
dcterms.type | Dissertation | |