Using MapReduce for High Energy Physics Data Analysis

Fabian Korte, Helmut Neukirchen, Thomas Rings, Jens Grabowski

Abstract

At the Large Hadron Collider (LHC) High Energy Physics (HEP) experiment at CERN, 15 PB of raw data is recorded per year. As it was considered inconvenient to store, access and process this data using the traditional hardware and software tools, this data gets reduced to 10–200 TB per year. This paper investigates the applicability of the MapReduce paradigm for analyzing HEP data. In a case study, a sample HEP analysis that makes use of the HEP analysis framework ROOT has been re-implemented using the MapReduce implementation Apache Hadoop. In addition, a Hadoop input format has been developed that takes storage locality of the ROOT file format into account. This approach was evaluated in a cloud computing environment and compared to data analysis with the Parallel ROOT Facility (PROOF).
Keywords: 
MapReduce, Hadoop, Input format, ROOT, PROOF, High Energy Physics, Cloud computing
Document Type: 
Articles in Conference Proceedings
Booktitle: 
Proceedings of the 2013 International Symposium on MapReduce and Big Data Infrastructure
Language: 
English
Series: 
MR.BDI 2013
Address: 
Sydney, Australia
Publisher: 
IEEE
Pages: 
1271--1278
Month: 
12
Year: 
2013
DOI: 
10.1109/CSE.2013.189
2024 © Software Engineering For Distributed Systems Group

Main menu 2