In Part 3 of this series we will explore some more variations to our Sales Dashboard in R and introduce new ways of visualizing sales related data with qplot and ggplot2. If you haven't done it yet, it is recommended to read Part 1 and Part 2 first.

## 1. Dodging (with care!)

The last bar chart we created in Part 2 could be further improved to allow a year by year comparison of the orders each sales person brought in. Visually we could show the orders from each year side-by-side for each sales person. Once again, this is fairly easy to do with qplot. It only takes one additional parameter.

In ggplot2 jargon, switching from stacked bars (the default) to side-by-side bars is called dodging. This is obtained with the parameter position=dodge in the call to qplot. This looks great, except that is WRONG! Not easy to recognize, but once we dodge the bars qplot stops stacking them within each year and reverts to simply overlapping them. There is indeed a limitation in the current implementation of ggplot2 where it is not possible to stack according to one variable and dodge according to another one at the same time.

In order to check that within each year we indeed have a number of overlapped bars instead of stacked ones, let's redraw the previous chart by adding an alpha parameter. The alpha parameter makes the bar semi-transparent and when they overlap the color adds up until it becomes solid. The value of alpha says how many overlapping level there should be until the color becomes solid.

With alpha=I(1/5) we tell qplot that the color should become solid when 5 levels are stacked. Here is the resulting chart. You will not that within each year there are multiple bars overlapping. So only the orders with the maximum value within each year/sales person combination are the only visible one in the original chart above. To check it, let's summarize the data to calculate the maximum Order.Amount for each year / sales person combination. For the purpose, we can use melt and cast to create a Pivot Table as explained in a previous post or we can use an alternative method based on the aggregate() function. Let's follow the latter route to practice with something new.

This are indeed the values that are plotted in the WRONG chart above.

## 2. Dodging the right way

What we need to do in order to obtain the correct chart, where summing up the bars for each year's orders for the same sales person leads to the correct totals, is to summarize the data as needed in a new data frame. We can use aggregate() or melt and cast for the purpose. Let's stick to using aggregate().

aggregate() works by using the specified aggregation function (sum in this case) to aggregate Order.Amount by Salesperson and Year. The ~ symbol in the formula can be read as "by", while data specifies the source data frame for the variables. The "+" between the two aggregation factors indicates we want to use both. In this case, it will sum all order amounts for each sales person and year combination, which is exactly what we want.

These are the right totals to chart. For an easier handling, let's modify the name of the second column in the data frame we just obtained.

Ok, we are ready to plot our correct dodged bar chart using the new data frame we just created. Note that beside setting data=data.sum and changing fill=Order.Year, we have also set reverse=FALSE to better match the orders of the years in the legend (guide) with the left to right order in the chart.

Now that we have a correct chart, let's move on with an additional improvement.

## 3. Sorting by total Order Amount

We have obtained indeed a nice and easy to read chart, but as it is right now it doesn't make it easy to evaluate who our top sales people are. We could revert to the stacked format or we could think of sorting the Sales Person axis, which currently is arranged in alphabetical order, by total Order Amount instead.

In order to understand how to do it, let's look once more at the structure of our data (this was covered extensively in Part 1).

We can note that Salesperson is a Factor. In particular, it is an unordered Factor.  We can test this with a call to is.ordered().

When dealing with unordered factors that are character based, by default qplot (and ggplot2 in general)  will revert to the standard ordering which is the alphabetical one. This is why our sales people are listed in alphabetical order in the bar chart.

We can change this be reordering the Salesperson factor according to the sum of the orders for each sales person. This can be easily achieved with the reorder() function or by creating a new factor with factor(). Let's use the latter method.

First, we need to calculate the total order amount for each sales person. This can be done with a new call to aggregate().

Next we need to sort Total.Order by Order.Amount (which is the total order amount). For the task we can use the order function within the index of Total.Order. Here is how to achieve it.

We have specified decreasing=TRUE because we want to order our sales person from the highest to the lowest total order.

The last step is to use this sequence to order the Salesperson factor. For that, we basically re-create the factor with the new ordering sequence.

With this done, let's test the levels now.

Ok, we are ready to plot our data again using the new ordering of the Salesperson factor. The command is the same as for the last chart we created above, however the output will now be sorted by decreasing total amount of order per sales person. So Peacock has indeed been our best sales person over the 3 years and she deserves the first position in our chart!

This concludes Part 3. In Part 4 we will cover other ways to slice and dice our sales data. Till next time!

* This article originally appeared in Sales Dashboard in R with qplot and ggplot2 – Part 3

Share: