Building Interactive Graphs with ggplot2 and Shiny

Some time ago, I was contacted from guys at Packt Publishing. Their just published the Building Interactive Graphs with ggplot2 and Shiny online course and they ask me my (humble) opinion.

I am proud of their request, and I will review shortly here the Building Interactive Graphs with ggplot2 and Shiny online course. I'll publish a more in-depth review at the begin of September, when Italian R users come back from vacation. In this post, I will provide a description of the course. In the future post, I will highlight what was new for me and I will share an example of what I learned from this useful course.

I discovered the online course some days before I was contacted by Packt's team. The course sounds interesting to me because I was working on a project involving ggplot2 and Shiny. Moreover, I find an online course more effective and useful than a printed book. This is obviously, since I work for a company providing on line (and on site, too) R courses.

As the author says on his website:

The course consists of short videos (around 2 or 3 minutes) that explain one concept at the time. Each video comes with the relevant code, and pointers to go further in your own time.

About the target of this video, I agree with Arthur's review:

I highly recommend it to a very wide audience, from students beginning data science or statistics to mature data analysts or even seasoned enterprise business intelligence professionals.

Course length is about 90 minutes, so you can watch it during the favorite serial of your wife (Italian TV networks usually broadcasts two episodes at time) or the unmissable soccer match of your husband.

The course consists of 40 short videos, grouped in eight sections. You can find the course outline, at the official website of the video course. If you never bought a course from Packt Publishing, you can download the whole course in a single zip file. Once you downloaded and uncompressed the zip file, you have to open the index.html page with your favorite browser. A pleasant (off line) web site, will direct you to the video of your interest. You can watch the course head to tail, but its structure allows you to watch immediately the topics you need now and postpone the others. Alternatively, you can watch each video online, even in your internet connected TV. The third link allows you to download the code. You'll download presentations too, but they were not very useful for me.

As you can see from my posts, I am not very able with English language. :-)
By the way, I found the British English of the author easy to understand to non-native speaker too.

The first five sections focus on ggplot2, starting from installing and exploring several advanced topics, such as faceting, big data and plot customization. All that requires the first hours.

Section 6 and 7 show Shiny capabilities. Unlike first sections, in which each section covers a well defined subject, you can imagine this as an unique section about Shiny, made by ten short videos.

Finally, the last section shows how to put everything together.

If you already know both ggplot2 and Shiny, this course will not improve your capabilities in a relevant way. You can find something new, especially in the ggplot2 part. Anyway, you can find it a valuable review and its structure allows you to jump to videos of your interest. If you are new to R or if you are new to ggplot2 and/or Shiny you should buy this online course now. You will be productive in a short while.

Posted in R | Tagged , , , , | 1 Comment

Presentations and video of the 5th meeting

Great success for the 5th MilanoR meeting.

At links below, you find speech presentations. Please leave a comment!

Posted in 5th MilanoR meeting, Home | Leave a comment

5th MilanoR meeting postponed to June 4

Dear R users and enthusiasts,
MilanoR staff announces the MilanoR meeting scheduled on Friday May 30 is postponed, due to a strike.

The 5th MilanoR meeting will be on Wednesday 4 June, at 6 pm.

The meeting will take place at:
Fiori Oscuri Bistrot & Bar (www.fiorioscuri.it)
Via Fiori Oscuri, 3 - Milano (Zona Brera)

Agenda

  • Singular Spectrum Analysis Applications with rssa package
    by Maurizio Sanarico, Chief Data Scientist at SDG consulting
  • Business Data Visualization (and some fun too!) with ggplot2
    Video conference by Marco Ghislanzoni, Marketing Program Manager at Royal DSM
  • Replica, an open source distributed system for the R environment
    by Davide Dal Farra, codref*

Sponsors

Revolution Analytics    Quantide

How to attend?

MilanoR is a free event, open to all R users and enthusiasts or those who wish to learn more about R. Places are limited so, if you would like to attend to the MilanoR meeting, please register below. Meeting we’ll be free of charge, our sponsors will provide the open bar session.

(if you're reading this post from a news feed, e.g. from R-bloggers, please visit the original post in the MilanoR website to see the form and subscribe the event)

Your Name (required)
Nome (obbligatorio)

Your Email (required)
E-mail (obbligatoria)

Your Company / Institution
Azienda / Ente

How do you hear about MilanoR meeting (required)
Come sei venuto a conoscenza del meeting MilanoR (obbligatorio)
 LinkedIn Statistica.Ning Facebook R-bloggers MilanoR website E-mail Other/Altro

Posted in 5th MilanoR meeting | Tagged | Leave a comment

Quinto meeting MilanoR rinviato al 4 giugno

Gentili utenti e appassionati di R,
lo staff di MilanoR annuncia che il meeting previsto per venerdì 30 maggio è rinviato a causa di uno sciopero dei mezzi pubblici.

Il meeting MilanoR si terrà mercoledì 4 giugno, alle 18, presso Fiori Oscuri Bistrot & Bar (www.fiorioscuri.it) di Via Fiori Oscuri 3 a Milano, in Zona Brera.

Programma

  • Applicazioni della Singular Spectrum Analysis con il package rssa
    di Maurizio Sanarico, Chief Data Scientist presso SDG consulting
  • Visualizzare Dati Aziendali (e anche un po' di divertimento) con ggplot2
    videoconferenza di Marco Ghislanzoni, Marketing Program Manager presso Royal DSM
  • Replica, un sistema distribuito open source per l'ambiente R
    di Davide Dal Farra, codref*

Sponsor

Revolution Analytics    Quantide

Come partecipare

MilanoR è un evento gratuito, aperto a tutti gli utenti e gli appassionati di R o a coloro che ne vogliono sapere di più su R. Gli sponsor offriranno l'open bar e il buffet.

Poiché i posti sono limitati, per favore utilizza il modulo sottostante per registrarti gratuitamente al meeting. Per motivi organizzativi, si chiede di registrarsi nuovamente anche a coloro che si erano già iscritti per il meeting del 30 maggio.

Your Name (required)
Nome (obbligatorio)

Your Email (required)
E-mail (obbligatoria)

Your Company / Institution
Azienda / Ente

How do you hear about MilanoR meeting (required)
Come sei venuto a conoscenza del meeting MilanoR (obbligatorio)
 LinkedIn Statistica.Ning Facebook R-bloggers MilanoR website E-mail Other/Altro

Posted in 5th MilanoR meeting | Tagged | Comments Off

May 30: 5th MilanoR meeting

Due to a strike, MilanoR meeting postponed to June 4

Please, visit the update page

Posted in 5th MilanoR meeting | Tagged | Leave a comment

May 30: 5th MilanoR meeting

Due to a strike, MilanoR meeting postponed to June 4

Please, visit the update page

Posted in 5th MilanoR meeting | Tagged | Leave a comment

30 maggio: quinto meeting MilanoR

A causa di uno sciopero, il meeting MilanoR previsto per il 30 maggio è stato rinviato al 4 giugno.

Vai alla pagina del meeting aggiornata

Posted in 5th MilanoR meeting | Tagged | Comments Off

Sales Dashboard in R with qplot and ggplot2 – Part 3

In Part 3 of this series we will explore some more variations to our Sales Dashboard in R and introduce new ways of visualizing sales related data with qplot and ggplot2. If you haven't done it yet, it is recommended to read Part 1 and Part 2 first.

1. Dodging (with care!)

The last bar chart we created in Part 2 could be further improved to allow a year by year comparison of the orders each sales person brought in. Visually we could show the orders from each year side-by-side for each sales person. Once again, this is fairly easy to do with qplot. It only takes one additional parameter.

In ggplot2 jargon, switching from stacked bars (the default) to side-by-side bars is called dodging. This is obtained with the parameter position=dodge in the call to qplot.

Rplot13

This looks great, except that is WRONG! Not easy to recognize, but once we dodge the bars qplot stops stacking them within each year and reverts to simply overlapping them. There is indeed a limitation in the current implementation of ggplot2 where it is not possible to stack according to one variable and dodge according to another one at the same time.

In order to check that within each year we indeed have a number of overlapped bars instead of stacked ones, let's redraw the previous chart by adding an alpha parameter. The alpha parameter makes the bar semi-transparent and when they overlap the color adds up until it becomes solid. The value of alpha says how many overlapping level there should be until the color becomes solid.

With alpha=I(1/5) we tell qplot that the color should become solid when 5 levels are stacked. Here is the resulting chart.

Rplot17

You will not that within each year there are multiple bars overlapping. So only the orders with the maximum value within each year/sales person combination are the only visible one in the original chart above. To check it, let's summarize the data to calculate the maximum Order.Amount for each year / sales person combination. For the purpose, we can use melt and cast to create a Pivot Table as explained in a previous post or we can use an alternative method based on the aggregate() function. Let's follow the latter route to practice with something new.

This are indeed the values that are plotted in the WRONG chart above.

2. Dodging the right way

What we need to do in order to obtain the correct chart, where summing up the bars for each year's orders for the same sales person leads to the correct totals, is to summarize the data as needed in a new data frame. We can use aggregate() or melt and cast for the purpose. Let's stick to using aggregate().

aggregate() works by using the specified aggregation function (sum in this case) to aggregate Order.Amount by Salesperson and Year. The ~ symbol in the formula can be read as "by", while data specifies the source data frame for the variables. The "+" between the two aggregation factors indicates we want to use both. In this case, it will sum all order amounts for each sales person and year combination, which is exactly what we want.

These are the right totals to chart. For an easier handling, let's modify the name of the second column in the data frame we just obtained.

Ok, we are ready to plot our correct dodged bar chart using the new data frame we just created.

Rplot18

Note that beside setting data=data.sum and changing fill=Order.Year, we have also set reverse=FALSE to better match the orders of the years in the legend (guide) with the left to right order in the chart.

Now that we have a correct chart, let's move on with an additional improvement.

3. Sorting by total Order Amount

We have obtained indeed a nice and easy to read chart, but as it is right now it doesn't make it easy to evaluate who our top sales people are. We could revert to the stacked format or we could think of sorting the Sales Person axis, which currently is arranged in alphabetical order, by total Order Amount instead.

In order to understand how to do it, let's look once more at the structure of our data (this was covered extensively in Part 1).

We can note that Salesperson is a Factor. In particular, it is an unordered Factor.  We can test this with a call to is.ordered().

When dealing with unordered factors that are character based, by default qplot (and ggplot2 in general)  will revert to the standard ordering which is the alphabetical one. This is why our sales people are listed in alphabetical order in the bar chart.

We can change this be reordering the Salesperson factor according to the sum of the orders for each sales person. This can be easily achieved with the reorder() function or by creating a new factor with factor(). Let's use the latter method.

First, we need to calculate the total order amount for each sales person. This can be done with a new call to aggregate().

Next we need to sort Total.Order by Order.Amount (which is the total order amount). For the task we can use the order function within the index of Total.Order. Here is how to achieve it.

We have specified decreasing=TRUE because we want to order our sales person from the highest to the lowest total order.

The last step is to use this sequence to order the Salesperson factor. For that, we basically re-create the factor with the new ordering sequence.

With this done, let's test the levels now.

Ok, we are ready to plot our data again using the new ordering of the Salesperson factor. The command is the same as for the last chart we created above, however the output will now be sorted by decreasing total amount of order per sales person.

Rplot19

So Peacock has indeed been our best sales person over the 3 years and she deserves the first position in our chart!

This concludes Part 3. In Part 4 we will cover other ways to slice and dice our sales data. Till next time!

* This article originally appeared in Sales Dashboard in R with qplot and ggplot2 – Part 3

Posted in Introduction to R | Tagged , , , | Leave a comment

Sales Dashboard in R with qplot and ggplot2 - Part 2

In Part 1 of this series we moved the first steps into building our Sales Dashboard in R. In this Part 2 we explore additional ways to display sales related data.

If you haven't read Part 1, it is highly recommended that you do so first because we will build on what was covered there.

1. Bar charts

A useful way to visualize the total order intake by sales person is to produce a bar chart with the total order amount for each sales person. While this is also an easy task for qplot, we have to be careful about which additional parameters are needed to obtain exactly what we want. Here is the right syntax.

Rplot04

Note how qplot has automatically calculated the total order amount per sales person. This is the default behavior for geom="bar" and stat="identity". What actually happens is that qplot generates one bar per order, with an height proportional to the order amount, and then groups and stacks all bars belonging to the same sales person. You can "see" the stacking by coloring the bar outlines in a different color.

Rplot05

Note that to specify a fixed color for the outline we used the color parameter with an argument of I("blue"). The function I() tells qplot to use the value as is, without any conversion or attempt to interpret it as a variable name.

Now that we understand better how qplot thinks, we can improve the way our data are visualized. For example, we could color code each bar according to the country where the order was taken. As  there is a 1:1 relationship between each sales person and its country, the effect is to color each bar uniformly either for USA or for UK.

Rplot06

The parameter useful to color the interior of each bar according to the Country is fill.

Note that the color coding by country has had the effect to add a legend on the left, which in turn has reduced the area available for the chart, causing an annoying overlapping effect for the names of the sales person. In order to fix this, we need to use a more advanced feature that goes beyond qplot, but bear with me because it is not that complex. We are going to rotate the labels for the x axis by 90 degrees and align them properly under the tick marks.

Rplot07

The call to the theme function has the effect to override the default appearance for the specified element, in this case axis.text.xangle=90 rotates the text clockwise by 90 degrees while hjust=1 and vjust=0 align it properly under the tick marks. You can experiment with different values to see the effect. For example, vjust=0.5 centers the text under the axis tick mark.

In the case of this data set, color coding by Country doesn't add a lot of meaning to the visualization. It would be much more useful for example to color code each portion of the bar according to the calendar year in which the order was taken.

2. Stacked bars

Earlier we found out that each bar in our bar chart is actually made of a series of stacked bars where each one has an height proportional to the order amount. Let's try to color code them by year instead that by country and see what it looks like. Experimenting (and, yes!, making mistakes) is often the best way to learn how qplot works!

Rplot08

 

This is indeed a fancy looking chart! I am sure any Sales Director would be absolutely pleased with it (ok, just kidding!)

What's the problem here? Well, we have told qplot to color code each stacked bar with a fill color corresponding to the Order.Date. Since almost all order dates are different one from the other, qplot has used a large range of discrete colors to try to code them all. The result is the Arlecchino chart above.

What we actually need to do is to have one different color for each year, which means we need to extract the year from each order date and pass it to fill in qplot. Having earlier converted Order.Date to a Date class allows us to use as.character to extract the year and convert it to a character format.

RPlot09

 

While the result is graphically as expected, there are a couple of annoyances in this chart. First, the title of the legend includes the function used to extract the year and convert it to a character sequence. Second, the sequence of the colors in the legend is exactly the opposite of the sequence of colors in the bars.

To fix the first problem we have different possibilities. One would be to add a Year variable to the data set, containing the order year already in the needed format. However this would represent an unnecessary duplication of information. The second way is to assign a different title to the legend. This is straightforward to do through the labs function.

The call to labs is telling qplot that whatever variable is used to encode the fill attribute (or "aesthetic" in ggplot2 jargon) should be labeled as specified by the fill argument.

Rplot10

To fix the order of colors in the legend, so that their sequence correspond to the one in the bars, we can use the guides function.

Rplot11

 

guides has a similar logic to labs. It takes the name of the attribute (aesthetic) that we want to modify in the legend and uses guide_legend to set an attribute for it. A way to read it is: in the guides (aka the legend), reverse the sequence of colors for the fill attribute.

It works exactly as expected, but our plotting command is getting long! Is there any chance to simplify it? As it turns out guide_legend supports another attributes among the many which is title, meant to set the title of the legend (or guide). In this case it is equivalent to what we achieved with labs, so we can omit the latter and move the definition of the lagend title within guide_legend.

This produces the exact same chart as above.

3. A final touch

We have generated with little code a professionally looking sales chart. I guess your Sales Director will be very much pleased with it. For the perfectionists out there, we could add a final touch to it though.

First, the labels for the axis are still the name of the variables in the data set. We could do better for sure. Second, we are missing a title for the chart. qplot can accommodate our needs through three additional parameters that can be specified directly into its call.

  • xlab sets the label for the x axis
  • ylab (you guessed it!) sets the label for the y axis
  • main sets the label for the chart title

Rplot12

This is it for Part 2. In Part 3 we will cover some more variations to the bar charts and other type of data visualization. Till next time!

* This article originally appeared in Sales Dashboard in R with qplot and ggplot2 - Part 2

Posted in R | Tagged , , , | Leave a comment

R Courses in Milan (Italy): May 2014

Dear R users,
the May 2014 public training course schedule for Milano (Italy) based courses is as follows:

Web Applications with R and Shiny May 15, 2014
Reports in R with RStudio May 16, 2014
Basic R Programming May 22, 2014
Data Visualization with R May 23, 2014

For course outline, information and prices, please visit www.quantide.com or contact training@quantide.com. We offer discounts for academia/public authorities/no-profit organization.

Posted in Courses | Tagged , , , , , , , , , | Leave a comment