Data analysis deals with different kinds of data.
For instance we can have supermarket sales with
- a transactional table, with customer ID, item ID, date of purchase
- an item table, with the item ID and its price
- a customer table, with customer ID and its anagraphic details (age, gender)
In this example data are tables with different structures.
In order to have a structured analysis framework, we can define how to treat each kind of data. They have different requirements and can be visualized in different ways.
Examples of requirements are positive prices for the item table and unique customer IDs in the customer table. The visualization of a transactional table can be a chart with the total daily number of purchases. For customers, it can be an histogram showing the age distribution.
There are also some relationships between tables. For instance, each customer ID in a transactional table should be in the customers table.
A good solution is using OOP (Object Oriented Programming). R has a particular structure for OOP, thought to ease data exploration.
There are some generic methods that can be applied to any class, like "plot". It generates a different chart depending on the type of data. It displays the values of a numeric array in a simple chart. Instead, if applied to a data.frame with at least 3 columns, it generates a multiplot for any combination, like "pairs" method. Just type plot(iris) to see this example.
OOP allows to define a class for any type of table. These classes inherit all from data.table (or data.frame) and may be S3 or S4 classes (see the documentation). Each class may have conditions to check when data are loaded. In addition, it's possible to redefine some methods, like "plot". In the example, there are 3 objects: "transactions", "items", "customers".
It's also possible to put different tables together. S4 classes allow to create an object, similar to a list, containing different data. In the example, the object has 3 slots, containing the 3 tables. It's possible to define how to check conditions between tables and visualizing all of them together.
There are different improvements that come from OOP. Code is more structured and easier to develop, understand, and share.
My next article will describe how to define the classes.
Our friend Stefan has been participating in MilanoR since the beginning, and was one of the people who started using R intensively after the "Introduction to R" Quantide course. Since he is from Belgrade (Serbia), and takes part in the activities of the Belgrade R community, there is an interesting R event/conference which will take place in Belgrade in June, which he would like to share with us.
this is the last post of the 2013.
I wish you all Merry Christmas.
I discovered Plotly some days ago, and I was fascinated by it.
What is Plotly?
Plotly is a service for creating and sharing data visualizations that also offers statistical analysis tools plus a robust API, the ability to graph custom functions and a built-in Python shell. Among its APIs, there is the R one: Plotly interactive visualization can be created directly from R.
This week, the post is an interview with Max Marchi. Max is the author, with Jim Albert, of the book "Analyzing baseball data with R".
Hi, Max. Welcome back to MilanoR. Last time you wrote for us a series of articles about maps with R. Now you're here as author of a book. How this idea was born?
Some time ago CRC Press sent a call for proposals to several mailing lists. They were accepting suggestions for books (for their R Series) on three main themes, one of which was “Applications of R to specific disciplines”. The examples they suggested were biology, epidemiology, genetics, engineering, finance, and the social sciences. But I thought “Why not baseball”?
Thanks to R-bloggers, I discovered that googleVis 0.4.7 with RStudio integration is available on CRAN.
This is great news, but wasn't this that catched my eye. At the end of the post a beautiful map shows a terrible event of the last days: the devastating typhoon track of Haiyan that hit Southeast Asia in November.
How to modify axis labels is a FAQ for (almost) all R users.
This short post try to give a simple but exhaustive reply to this question.
R can be connected with Hadoop through the
rmr2 package. The core of this package is
mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show how it works and to provide an example.