Table of contents

 

Basic graphics and data

Basic graphics with one variable

One of the main reasons to use R instead of statistical programs is for its strong graphical capabilities. To see some of these capabilities, write demo(graphics).

Basic graph types are density plots, dot plots, bar charts, line charts, pie charts, box-plots and scatter plots.

Plots in R have two types of commands, high-level commands to create the plot and low-level commands to add things to the plot, once it has been created. These low-level commands will do nothing if there is not an active plot.

Some of the most used low-level commands are:

points () :
Add points
lines() :
Add a line graph
abline () :
Add a straight line
title() :
Add a title
legend() :
Add a legend
text() :
Add a text string at the desired coordinates into a figure.

The main primary command is plot(). Depending the input data, it will do the type of plot that best fit. But of course is possible to change. Looking at the help(plot) page is very advisable before start and take a look into the arguments. Changing for example the type of plot.

For more details on how to make graphs in R,there is a lot of accessible material into Internet, like tutorials1, books2 3, and example graphics 4 5 some of them very good ones.

Data files inside R and graphs with two or more variables

We have already created data using c() 6, vector(), matrix(), data.frame() and list(), and also converted ones into others using as.factor(), as.data.frame(), etc. Other important source of data for training are the data files already housed into R packages and used for the examples in help() 7 or demo().

This data files are also very useful for training and teaching R. We have used some already (e.g. iris) and will use them more latter. They are also the easiest way for asking questions into forums and other webs, because it avoids all the problems of importing and exporting data.

Use data() to see all data available from package “datasets” of from other packages.

We can explore the plot() possibilities with iris data frame. It uses different graphs depending on the input data.

Plot a data frame

Observe that Species is a categorical variable. It has been converted to numeric, but obviously this is not the best way to represent a categorical variable. It is better this other way:

Random data

Sometimes may be interesting to use random data. This can be done using the command sample() 8 to take a sample of the specified size from a vector.

Other useful command to generate random numbers is runif() 9. See its help page for more info.

Random data following a particular distribution are very useful for simulations. Usually with distributions from package stats 10 there is enough, but if you need more you can look here.

Plotting distributions histograms

Export figures

Once we have a nice picture, we may want it outside R, usually in a given format, size and resolution.

Possible output formats are: jpeg, bmp, png, tiff. All of them can be done with the chosen size and resolution. See the help page of jpeg()11 for more information.

Other possible output format is pdf. See the help page for pdf()12 for more information.

The main difference between pdf and the other formats is that in pdf() format we can introduce as many pictures as we want before closing the device, while in others (eg. jpeg) if we start a new plot, it will overwrite the previous one and we will only output the last one before closing the device.

With any of these commands what we are actually doing is opening a device or graphical window different from the default R one. Once we have finished our graph, we have to close this device, with dev.off().

It is possible to change the size, background color, resolution, etc.

Look for the differences between Plot1.jpeg and Plot1.tiff. Change the parameters and see the results. A very useful background color is bg = “transparent”.

Import - export data

The best format for data is always plain text. We can find data in plain text with different extensions: .txt for text; .csv for comma separated values. In Spain and other countries, the comma is used as decimal separator and therefore is not a good idea to use the comma also to separate values. A better way to separate values into a .csv is by semicolons ; or by tabs.

To see how to import and export data we will first export the data form iris into a file called IrisFile.csv separated by semicolons and with comma as decimal separators, and then import those data from the file.

See the table into the working directory. Try to open it with a plain-text editor, with excel or with LibreOffice. Be careful, do not save changes without changing the name because some programs (especially Excel) usually change the format.

Now, to import a data file:

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

In both cases, export and import, we can specify the folder where we want (or have) the data table into the file name. By now it is a good idea to keep everything into the working directory.

Import data from web site

It is also possible to import data from a web site. For example if we want to import the original data from this article: https://peerj.com/articles/703/ We can download the data from the link and then import it with read.table() or read.csv(), but it would be easier to put directly the URL direction. Note: If the data are heavy and we are going to download them many times, it may be better the first option for downloading the data only once.

## 'data.frame':    486 obs. of  11 variables:
##  $ date        : chr  "2018-05-18" "2018-05-18" "2018-05-18" "2018-05-18" ...
##  $ specie      : chr  "Cydonia oblonga" "Cydonia oblonga" "Cydonia oblonga" "Eriobotrya japonica" ...
##  $ code        : chr  "Cob" "Cob" "Cob" "Eja" ...
##  $ block       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ rep         : int  1 2 3 1 2 3 1 2 3 1 ...
##  $ temperature : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ n_grains    : int  11 19 51 28 52 69 50 41 68 77 ...
##  $ n_germinated: int  1 2 6 1 2 4 5 2 3 1 ...
##  $ media_long  : num  2 1.25 1 1 1 1.25 2.1 1.5 2.17 1.5 ...
##  $ max_long    : num  1 1 2 2 1.5 1 4 1.5 4 1.5 ...
##  $ observ      : chr  "" "" "" "" ...

Other data formats

There are particular commands and packages in R for working with data in different formats:

“readxl” package, with read_excel():
While read_excel() auto detects the format from the file extension, read_xls() and read_xlsx() can be used to read files without extension.
“WriteXLS” package, with WriteXLS():
Writes a list of data.frames to an xlsx file. Some formats allowed.

There are several other packages to work with Excel files as “writexl” or “xlsx”. The lasta one can be difficult to install because it relies on java installed.

“shapefiles”, “maptools”, “rgdal”, “spatstat” packages:
To read and work with shape-files and maps.
“ape” package:
To read DNA sequences.

Exercises

  1. Plot a cheat-sheet with values of color and point type (col = , and pch = ) from 1 to 25, and export it as a jpeg of 15 cm wide, 6 cm high and resolution 100 points per cm.

  2. Plot into a graph ten Poisson distributions with lambda ranging from 1 to 10. Put legend and title. Export it as a .tiff file with size of 15x15 cm.

  3. Import data from this article: https://peerj.com/articles/328/

Web

Web

Be careful importing the data. Notice that you have to skip two first lines using “skip = 2”13.

With these data, using for(), plot graphs to represent the effect of all the numerical variables, from “richness” to “mean_quality” on “yield”. Choose the type of graph that you think better represents this effect for the different species. Create only one pdf with all the graphs inside.

To find the best graph for each type of data, a very helpful web is from Data to Viz https://www.data-to-viz.com/.

 


 

About this tutorial

Cite as: Alfonso Garmendia (2020) R for life sciences. Chapter 3: R base graphics and data management. http://personales.upv.es/algarsal/R-tutorials/03_Tutorial-3_R-base-graphics.html.

Available also in other formats (pdf, docx, …): https://drive.google.com/drive/folders/19w914WCg8BVTVBE_zpgShmg2vpjguV1e?usp=sharing.

Other simmilar tutorials: https://garmendia.blogs.upv.es/r-lecture-notes/

Originals are in bitbucket repository: https://bitbucket.org/alfonsogar/tea_daa_tutorials.

 

Document written in Rmarkdown, using Rstudio.

 

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

 


  1. Quick-R

  2. Chang 2013

  3. Murrel 2011

  4. r-graph-gallery

  5. Murrel

  6. ?c to see the help page

  7. ?help to see the help page

  8. ?sample to see the help page

  9. ?runif to see the help page

  10. see ?distributions

  11. jpeg(): ?jpeg for help

  12. pdf(): ?pdf for help

  13. ?read.table for help