Automated Deployment and Distributed Execution of Scientific Software in the Cloud using DevOps and Hadoop MapReduce

Michael Göttsche


Researchers of various disciplines that employ quantitative modelling techniques are confronted with an ever-growing need for computation resources to achieve acceptable execution times for their scientific software. This demand is often fulfilled by the use of distributed processing on mid- to large size research clusters or costly specialized hardware in the form of super computers. However, to many users these resources are either not available due to financial constraints or their usage requires in-depth experience with parallel programming models. The acquisition of respective knowledge and the provisioning of an autonomous computation infrastructure are both often too time-intensive tasks to be an option. This thesis presents an approach that requires the researcher neither to have the financial resources to purchase hardware or concern herself with administrative tasks, nor to implement a complex parallel programming model into the software. Instead, we propose a Cloud-based system that transparently provisions computing resources from Infrastructure-as-a-Service providers with the goal of a reduced overall runtime. Besides others, utilizing such providers offers the benefit of a pay-per-use model. Our solution consists of two components based on DevOps tools: (1) a tool for the provisioning and deployment of a Hadoop cluster on the Cloud computing resources based, (2) a tool for the automated deployment and distributed execution of scientific software with minimal effort on the user’s side. We perform two case studies to compare the performance of our solution with the non-distributed execution of the software as well as with a native MapReduce implementation. The results show that our approach outperforms both alternatives with negligible setup effort by the user and thus is a viable choice for the scenario outlined above. While we focus on scientific software in this thesis, the suitability of our approach is by no means inherently limited to this category, but is much rather applicable to a wide variety of domains.
Document Type: 
Master's Theses
Gottingen, Germany
Institute of Computer Science, University of Göttingen

Main menu 2

2011 © Software Engineering For Distributed Systems Group