Blog

Sales Dashboard in R with qplot and ggplot2 - Part 1

In a previous post on my personal blog about creating Pivot Tables in R with melt and cast we covered a simple way to generate sales reports and summary tables from a data set consisting of orders. It is often said that a picture is worth 1000 words, so in this series of posts we will focus on how to create visual representations and summaries of the same data.

Our graphical library of choice for the job will be ggplot2 (what else?), even though we are mostly going to use it in its simplest format, which is through qplot. I have written other posts on ggplot2 which you may want to also read.

1. Getting started

If you haven't done it yet, please complete steps 1, 2 and 3 in my previous post Pivot Tables in R with melt and cast. The file with the data can be obtained from the link at the bottom of that post. Once completed, you should have your data set loaded in R and ready for the next steps.

2. Checking the data

Before starting to plot any data frame with ggplot2, it is a good idea to check the data structure and make sure all variables have the correct type. As a matter of fact ggplot2 is a very smart library and will attempt to plot your data even if they are not in the expected format. While this may or may not produce a warning message, the results may end up being far from what we expect. Better to check in advance and save us the pain of a long troubleshooting afterwards.

It has been pointed out that str is one of the most useful functions in R and this is surely true! Let's take a look at the structure of our data set.

The use of str highlights indeed a problem with our data set. Order.Date is currently regarded by R as a factor instead of a Date. If we are thinking of grouping our sales data by quarter for example, it would be useful to convert it to a Date class so we can use data manipulation functions such as quarter() to extract the quarter of the year. This is an easy fix.

Note that the format string using in as.Date has to match the format of the date in Order.Date. In this case %d represents the day in digits (1-31), %m the month in digits (1-12) and %Y (capital Y) the year in the 4-digits format (1900-2999).

After the conversion, our data set structure looks like this.

We are now ready to create our sales dashboard.

3. A simple scatter plot of orders

Visualizing data in a simple and immediate format should always be the first step of a good visual data analysis. This allows to spot anomalies (for example outliers) and to get an overview of the content of the data set before aggregating and manipulating it further.

Let's start with a plot of all Order.Amount in a temporal sequence, which means by Order.Date.

Rplot01

Note few things here. First, we need to load the ggplot2 library before we can use qplot. This only needs to be done once in the same R session. Second, qplot is invoked with 3 arguments:

  • x is the variable we want to plot on the horizontal axis
  • y is the variable we want to plot on the vertical axis
  • data is the name of the data set the variables belong to, which allows us to specify them just by variable name (such as Order.Date or Order.Amount) instead that in the full format (which would be data$Order.Date or data$Order.Amount)

Third, if we do not specify any further parameter, qplot uses its defaults for all the rest. Which default is used depends also on whether only y is specified or both x and y. When both x and y are specified, the default is to produce a scatter plot of y values versus x values. Another default is to use the variable names as labels for the axis, as well as apply the standard theme. Enough technicalities, let's get back to data visualization.

Let's say we are interested to show from which country the orders came from. Let's color code the points in the scatter plot according to the value of the Country variable in the data set, which is either USA or UK. With qplot this is as easy as adding an extra argument to the function call.

Note that the color parameter can also be used with its British spelling of colour. Here is the resulting chart.

Rplot02

Once more, qplot has applied some defaults. First, a standard high-contrast color scheme to distinguish between the orders coming from the two different countries. Second, a legend on the left of the chart specifying how to read each color. The title of the legend is, by default, the name of the variable used to color code the points. Sweet!

Let's try to color code the points according to the sales person who took the order. Another easy one with qplot. Just change the color parameter to the use the Salesperson variable.

Rplot03

qplot has done a nice job to accommodate our request and color code the points by Salesperson, however there are too many colors and the chart is not really meaningful. Time to switch to a different view!

In Part 2 we will cover Bar Charts and how to make the best use of them. Till next time!

* This article originally appeared in Sales Dashboard in R with qplot and ggplot2 - Part 1

0
Shares
Posted in R | Tagged , , , | Leave a comment

How to open an SPSS file into R

R is a powerful system for statistical analysis and data visualization. However, it’s not exactly user-friendly for data storage, so, still for several time your data will be archived using Excel, SPSS or similar programs.

How to open into R a file stored using the SPSS (.sav) format? There are some packages as foreign which allow to perform this operation. The package foreign is already present in the base distribution of R system and you just need to activate it using the function library().

When you activated the package, you can open your file if you know where it’s located… the simpler method to locate a file (Yes, I know, you can set the work directory, but I have abrupt manners) is to send the instruction:

The system will open a window for the file access; you can look for your file in the folder where you have earlier archived it. R return the path to file:

Now, you can read the SPSS file using foreign, specifying the path to file (yes, you have understood, you need to copy and paste the path):

Do you want avoid the copy and paste? You can assign the result of the instruction file.choose() to an object named db (abbreviation for database):

As before, you obtained the path to file, but this time R not shows it because you assigned to the object db. Then, the object db contains a character string identifying the path that R will have to follow to recover the file. Using this way, you need to run file.choose() at every session, while if you write the path you can use it every time. Ready go?

The instruction read.spss() read the dataset in sav format. You must be careful, however, to specify as TRUE the argument to.data.frame, which requires to the function to arrange the data within a data frame (i.e. the class of R object for data tables).

Yolo, man. Another very simple method to open an SPSS file into R is to save the file in a format which R manage very well: the dat format (tab-delimited). So, you save your SPSS file in .dat and you behave as before, searching the file with file.choose() and assigning the resulting string to an object.

The function to read the file, now, is read.table(). Pay attention to missing data: if there are missing values, you should to indicate to R what is their code (e.g. 999), specifying a value for the argument na.strings.

Do you have your file in .dat format?

The argument header = TRUE specifies that the first row of the file contains the variable names, therefore these values aren’t to interpret as data.

Being in a hurry? Conflate  all the operations in just one line:

or, with .dat:

Once you import a file, it’s a good idea to verify that the reading was performed with accuracy.

To check the size of your database, use the dim() function. You will obtain two numbers, the first one refers to the cases (rows in your database), while the second one is the number of variables (the columns of your database).

Further, can be useful to visualize a preview of data. To inspect the first six rows of the dataset, use the head() function:

To inspect the flast six rows of the dataset, use the tail() function:

To inspect the structure of the dataset, use the str() function:

Do you want visualize the entire matrix of your dataset? If the data table is large, it is advisable to use the function View(), or fix() which allows you to manually edit the cell content.

This post was originally written in Italian by Davide Massidda and Antonello Preti and published in InsulaR blog

How to open into R a Microsoft Excel file? Please read again the post Read Excel files from R.

0
Shares
Posted in R | Tagged , , | Leave a comment

R AND OOP - defining new classes

My previous article shows an example in which data analysis requires a structured framework with R and OOP. In order to explain how to build the framework this article describes how to do that in more detail.

Using OOP means creating new data structures and defining their methods that are functions performing a specific tasks on the object. Defining a new data structure requires creating a new class and this articles shows how to create it through S4 R classes.

Continue reading

0
Shares
Posted in R | Tagged , , | Leave a comment

R framework with Object-Oriented Programming

Data analysis deals with different kinds of data.
For instance we can have supermarket sales with
- a transactional table, with customer ID, item ID, date of purchase
- an item table, with the item ID and its price
- a customer table, with customer ID and its anagraphic details (age, gender)
In this example data are tables with different structures.

Continue reading

0
Shares
Posted in R | Tagged , | Leave a comment

DailyMeteo.org - 2014 Conference

Our friend Stefan has been participating in MilanoR since the beginning, and was one of the people who started using R intensively after the "Introduction to R" Quantide course. Since he is from Belgrade (Serbia), and takes part in the activities of the Belgrade R community, there is an interesting R event/conference which will take place in Belgrade in June, which he would like to share with us.

Continue reading

0
Shares
Posted in R | Tagged , , | Leave a comment

Merry Christmas

Dear R-enthusiastics,
this is the last post of the 2013.

I wish you all Merry Christmas.

Continue reading

0
Shares
Posted in R | Tagged , | Leave a comment

My first... web application with Shiny

It was several time I was thinking about developing a web application with R and Shiny.

In these days I realize my first application with Shiny. You can find it at http://spark.rstudio.com/nsturaro/pyramid0/

Continue reading

0
Shares
Posted in R | Tagged , , | Leave a comment

My first... plot (.ly): beautiful plots with Plotly

Questo articolo può essere letto anche in italiano

Dear R-enthusiastic,
I discovered Plotly some days ago, and I was fascinated by it.

What is Plotly?
Plotly is a service for creating and sharing data visualizations that also offers statistical analysis tools plus a robust API, the ability to graph custom functions and a built-in Python shell. Among its APIs, there is the R one: Plotly interactive visualization can be created directly from R.

Continue reading

0
Shares
Posted in R | Tagged , , , | 1 Comment

Analyzing baseball data with R

This week, the post is an interview with Max Marchi. Max is the author, with Jim Albert, of the book "Analyzing baseball data with R".

Hi, Max. Welcome back to MilanoR. Last time you wrote for us a series of articles about maps with R. Now you're here as author of a book. How this idea was born?
Some time ago CRC Press sent a call for proposals to several mailing lists. They were accepting suggestions for books (for their R Series) on three main themes, one of which was “Applications of R to specific disciplines”. The examples they suggested were biology, epidemiology, genetics, engineering, finance, and the social sciences. But I thought “Why not baseball”?

Continue reading

0
Shares
Posted in R | Tagged , , , | 1 Comment

Thirteen lines of code

Thanks to R-bloggers, I discovered that googleVis 0.4.7 with RStudio integration is available on CRAN.

This is great news, but wasn't this that catched my eye. At the end of the post a beautiful map shows a terrible event of the last days: the devastating typhoon track of Haiyan that hit Southeast Asia in November.

Continue reading

0
Shares
Posted in R | Tagged , , | 3 Comments