Model to Estimate the Size of a Hadoop Cluster - HCEm
José Benedito de Souza Brito Aletéia Patrícia F. Araújo
This paper describes a model which aims to estimate the size of a cluster running Hadoop framework for the proces-sing of large datasets at a given timeframe. As main contributions it defines (i) a light layer of optimization for MapReduce jobs,(ii) presents a model to estimate the size cluster for a Hadoop framework and (iii) performs tests using a real environment the Amazon Elastic MapReduce.