Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77293
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractWe study the task of image description generation, which can find applications in image search, web accessibility research, story illustration, etc. Rather than concentrating on precise but robotic descriptions, we aim to generate captions, which are human-like, but which are still relevant to the image content. Human generated text is nontrivial in structure and vocabulary. A purely bottom-up approach, relying only on vision detection vocabulary, would struggle to generate such a description as " A cute squirrel having a feast under a tree" . To generate descriptions, which are close to human-like in their complexity and richness, we exploit a vast amount of human-written text available on the Internet and use a dataset of images associated with their captions written by users the web-site Flickr. Based on various aspects of the target image, we collect a set of matching images. From the human-written captions of the obtained images we elicit candidate phrases associated with the matching aspects. We selectively glue together extracted phrases into plausible descriptions, using linguistic patterns and parse tree structure. We tackle this non-trivial task by modeling it as an Integer Linear Programming problem and introducing a novel tree-driven phrase composition framework. As an optional preprocessing step to the generation process, we introduce the task of image caption generalization, the aim of which is to remove extraneous information from image captions written by Flickr users. Evaluation results show that, when using generalized captions as a new source of candidate phrases, we are able to generate descriptions of a better quality in terms of relevance, whilst achieving expressiveness and linguistic sophistication of the resulting output.
dcterms.available2017-09-20T16:52:22Z
dcterms.contributorChoi, Yejinen_US
dcterms.contributorFodor, Paulen_US
dcterms.contributorMooney, Raymond.en_US
dcterms.contributorBorodin, Yevgenen_US
dcterms.creatorKuznetsova, Polina
dcterms.dateAccepted2017-09-20T16:52:22Z
dcterms.dateSubmitted2017-09-20T16:52:22Z
dcterms.descriptionDepartment of Computer Science.en_US
dcterms.extent146 pg.en_US
dcterms.formatMonograph
dcterms.formatApplication/PDFen_US
dcterms.identifierhttp://hdl.handle.net/11401/77293
dcterms.issued2015-08-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:22Z (GMT). No. of bitstreams: 1 Kuznetsova_grad.sunysb_0771E_12042.pdf: 79835138 bytes, checksum: ce218c6e155d7dd59b4af11b69e42573 (MD5) Previous issue date: 2014en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectimage descriptions, natural language generation, natural language processing
dcterms.subjectComputer science
dcterms.titleComposing Image Descriptions in Natural Language
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record