dc.identifier.uri	http://hdl.handle.net/11401/78358
dc.description.sponsorship	This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree	en_US
dc.format	Monograph
dc.format.medium	Electronic Resource	en_US
dc.format.mimetype	Application/PDF	en_US
dc.language.iso	en_US
dc.type	Dissertation
dcterms.abstract	As we approach the post-Moore’s law era, the burden of achieving good performance is shifting from hardware architects to software engineers. Programmers can no longer expect a doubling of performance for each hardware generation. The trend has moved from faster CPU’s to increased parallelism through multiple cores and specialized instructions (e.g. SIMD intrinsics). Taking advantage of the parallelism inherent in modern hardware, however, is a non-trivial task. The programmer is required to either spend significant time to optimize the code for a parallel architecture, or use automatic methods to generate the code (e.g. through compiler optimizations). This dissertation addresses each of these methods of obtaining optimized code and presents novel methods for easing the burden of optimization on the programmer. In the first part of this dissertation, we explore three case studies where we present successful optimizations for different architectures. In the first two case studies, optimizations are presented for GPGPU (General Purpose GPU) accelerators using the CUDA programming language. In the first case study, we present an optimized version of the FDK back-projection algorithm which is commonly used in CT (Computed Tomography) reconstruction. The second case study details how we map an inherently sequential clustering algorithm to a parallel architecture. In the third case study, we detail our optimizations for the LQCD (Lattice Quantum Chromo-Dynamics) physics simulation algorithm. In this study, we describe how we get good performance by combining AVX (Advanced Vector Extension) intrinsics, OpenMP, and MPI. In the second part of this dissertation, we describe an iterative optimization framework based on the ant colony optimization algorithm. Optimization decisions are formulated as a directed acyclic graph. The ant colony optimization algorithm is then used to identify a path through this graph that corresponds to an optimal GPU implementation. This framework is then extended by tightly integrating it with a compiler based on the polyhedral model; thus, allowing it to identify optimal skewing and permutation transformations as well as loop and GPU kernel fusion. Several optimizations not incorporated into previous GPU auto-tuners are also evaluated – including texture memory and parallel reduction. The framework also extends the traditional ant colony optimization algorithm to include performance metrics as well as a regression tree analysis to segment the search space into regions with promising performance. Results show significant speed-up over a GPU code generator based on the polyhedral model alone. In part 3 of this dissertation, we present a visualization interface which allows users to quickly explore a number of optimizations. This interface is constructed on top of R-Stream – an optimizing compiler based on the polyhedral model. This interface visualizes performance through heuristics and run time performance data and allows users to identify optimization opportunities and apply common loop transformations in a visually intuitive way. User studies show that users were able to identify optimizations that outperformed R-Stream alone. This interface also greatly improved the user’s understanding of the transformations made by R-Stream and the reasons behind them.
dcterms.available	2018-07-09T14:14:21Z
dcterms.contributor	Mueller, Klaus.	en_US
dcterms.contributor	Chapman, Barbara	en_US
dcterms.contributor	Liu, Annie	en_US
dcterms.contributor	Meister, Benoit.	en_US
dcterms.creator	Papenhausen, Eric
dcterms.dateAccepted	2018-07-09T14:14:21Z
dcterms.dateSubmitted	2018-07-09T14:14:21Z
dcterms.description	Department of Computer Science.	en_US
dcterms.extent	214 pg.	en_US
dcterms.format	Monograph
dcterms.identifier	http://hdl.handle.net/11401/78358
dcterms.identifier	Papenhausen_grad.sunysb_0771E_13395.pdf	en_US
dcterms.issued	2017-08-01
dcterms.language	en_US
dcterms.provenance	Submitted by Jason Torre (fjason.torre@stonybrook.edu) on 2018-07-09T14:14:21Z No. of bitstreams: 1 Papenhausen_grad.sunysb_0771E_13395.pdf: 7109738 bytes, checksum: 755cf975e90f67cd899a47b057433b5e (MD5)	en
dcterms.provenance	Made available in DSpace on 2018-07-09T14:14:21Z (GMT). No. of bitstreams: 1 Papenhausen_grad.sunysb_0771E_13395.pdf: 7109738 bytes, checksum: 755cf975e90f67cd899a47b057433b5e (MD5) Previous issue date: 2017-08-01	en
dcterms.subject	Computer science
dcterms.subject	GPU
dcterms.subject	Iterative Optimization
dcterms.subject	Performance Optimization
dcterms.subject	Polyhedral Model
dcterms.title	Manual, Automatic, and Semi-Automatic Performance Optimization
dcterms.type	Dissertation

Files in this item

Name:: Papenhausen_grad.sunysb_0771E_ ...
Size:: 6.780Mb
Format:: application/pdf

View/Open

This item appears in the following Collection(s)

Stony Brook Theses and Dissertations Collection [4009]

Show simple item record

Manual, Automatic, and Semi-Automatic Performance Optimization

Files in this item

This item appears in the following Collection(s)