Browsing Tag


A possibility for use R and Hadoop together

R blog By July 9, 2013 Tags: , , , , , 1 Comment

As mentioned in the previous article, a possibility for dealing with some Big Data problems is to integrate R within the Hadoop ecosystem. Therefore, it's necessary to have a bridge between the two environments. It means that R should be capable of handling data the are stored through the Hadoop Distributed File System (HDFS). In order to process the distributed data, all the algorithms must follow the MapReduce model. This allows to handle the data and to parallelize the jobs. Another requirement is to have an unique analysis procedure, so there must be a connection between in-memory and HDFS places.


A Big Data introduction

R blog By June 5, 2013 Tags: , , , , No Comments

Since R uses the computer RAM, it may handle only rather small sets of data. Nevertheless, there are some packages that allow to treat larger volumes and the best solution is to connect R with a Big Data environment. This post introduces some Big Data concepts that are fundamental to understand how R can work in this environment. Afterwards, some other posts will explain in detail how R can be connected with Hadoop.