As mentioned in the previous article, a possibility for dealing with some Big Data problems is to integrate R within the Hadoop ecosystem. Therefore, it's necessary to have a bridge between the two environments. It means that R should be capable of handling data the are stored through the Hadoop Distributed File System (HDFS). In order to process the distributed data, all the algorithms must follow the MapReduce model. This allows to handle the data and to parallelize the jobs. Another requirement is to have an unique analysis procedure, so there must be a connection between in-memory and HDFS places.
Since R uses the computer RAM, it may handle only rather small sets of data. Nevertheless, there are some packages that allow to treat larger volumes and the best solution is to connect R with a Big Data environment. This post introduces some Big Data concepts that are fundamental to understand how R can work in this environment. Afterwards, some other posts will explain in detail how R can be connected with Hadoop.
Nowadays, routinary operations on files, such as renaming or copying, are performed with some mouse clicks. Sometimes, it is useful perform this operations in batch. Linux users perform this operations through the shell. Also Windows users can use the shell, but there are also a lot of utilities that simplify these operations.
Why someone should use R to copy or rename a (lot of) file(s)?
"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio.
This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development.
This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.
A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.
This post is a brief follow-up to a question that appeared some time ago on the “The R Project for Statistical Computing” LinkedIn group, which I’m reporting here:
How can I draw a map of MODERN Europe?
Hi, I'm trying to draw a map of modern Europe but I've found only maps of twenty years ago, with Yugoslavia and Czechoslovakia still united!!!
Does anyone know where I can get a more recent map to be employed with packages such as 'sp' or 'maps'?
Thank you very much!
Two different solutions to the above question will be provided here, using two different R packages.
This brief tutorial illustrates how to combine S4 object oriented capabilities with function closures in order to develop classes with built in methods. Thanks to Hadley Wickham for the great contribution of material and tutorials made available on the web and to Bill Venables and Stefano Iacus for their kind reviews.
It's very convenient manage data with R: you can import your dataset, you could find many packages which respond to your needs, then you could plot your results.
However it could be very bothersome retrieve the data from online databases. You need to use the specific API and maybe write your scripts using a new programming language, then you have to convert your data in a table format and finally import them with R.
This article provide a brief background about power and sample size analysis. Then, power and sample size analysis is computed for the Z test.