In other tutorials you have a lot of information to install R and RStudio. I am not going to extend with this.
To install R and RStudio, the best way is to follow the instructions from the official sites:
Just make sure you download and install the free RStudio desktop, not a pay version.
If you need more help, a well detailed tutorial on how to install R and RStudio is https://www.rforecology.com/post/how-to-install-r-and-rstudio/.
There is a lot of information about R. Some good and free books are in the official page https://cran.r-project.org/other-docs.html Look a bit through them and may be you find one you would like to read more thoroughly.
The official R documentation is in https://cran.r-project.org/manuals.html but it feels a bit difficult to read, at least for a beginner, so just take a quick look.
A very complete guide of books and resources is the Big book of R
R has core packages and other packages. Core packages are always installed, but they may have to be activated using library(). If the package is not installed but it is in the repositories, it can be installed using install.packages().
R is case sensitive. LM() is not the same that lm().
Use ” # ” to comment text into your R script. Anything after ” # ” will not be run. It is important to explain all you do, commented, so any person (or you a month latter) can understand what is in your code.
R works with commands and objects. The commands run always with () and may have different parameters inside. Objects can be functions, data objects, and results objects (commands are really function objects).
To assign or create an object is possible to use either ” = ” or ” <- “. Most people use” <- “, and save” = ” only for equations and equalities. To write ” <- ” easily use ALT + ” - “.
It is important that objects are not named as commands or other previously existing objects unless you want to overwrite them. This is one of the main sources of trouble at the beginning.
There are many possible types of data (and results) objects, but the most important ones are:
| Object | Description |
|---|---|
| Simple objects | a number, a string of text, etc |
| Vectors | a list of simple objects. Can be numbers, text, etc |
| Factors | Categorical data. includes a vector with numbers for each category and another of levels |
| Matrices | Can only contain numbers |
| Data frames | Columns are vectors (or factors) and may have a variable name |
| Lists | Lists of any other objects |
Vectors may be numeric (numbers), character (text) or logical (TRUE, FALSE). A vector with mixed data will be treated as character.
Data frames and lists may have different types of data into the same object.
c(), print(), class(), str(), length(), as.character(), as.numeric(), [ ], ls(), rm()
To create an object you only need to assign it as in the next examples. For vectors the command c() is used. To see what is inside the objects we can run the object name, or run print(object) if you are inside a loop or a function.
To see which type of object you can use class() or better str(). The use of str() must be very frequent to become familiar to the structure of the different objects. The main characteristics of a vector are its class() and its length().
We can convert a vector from numeric to character and vice versa, using as.character() and as.numeric().
With [] we can point to parts of a vector to extract or replace them.
### Creating simple objects and vectors ----------------
### Simple Objects
N5 <- 5 # Numeric
Te <- "Five" # Text
### Vectors
VN <- c(1, 3, 4, N5) # Numeric vector
VT <- c("One", "Three", "Four", Te) # Text vector
### We created four objects: N5, Te, VN and VT.
### We used the first two objects as part of the vectors.
### See the content of the objects
N5 # or print(N5)
Te
VN
VT
### See the class of the objects
class(N5)
class(Te)
class(VN)
class(VT)
### See the structure of the objects
str(N5)
str(Te)
str(VN)
str(VT)
### See the length of the objects
length(VT)
### If we mix character and numeric objects into a vector,
### it will became a character vector:
UN <- c(VN, VT)
UN
str(UN)
length(UN)
### Convert numeric vector to character and vice versa
VNC <- as.character(VN)
str(VN)
str(VNC) # Notice that character data are in quotations
as.numeric(VNC) # Back to previous numeric vector
### Extract part of a vector
VT[1:2] # Extracts fist two positions of VT
VT[c(1, 3)] # Extracts positions 1 and 3 of VT
### Replace parts of a vector
VT # See initial VT
VT[2:4] <- c("two", "three", "four") # Replace
VT # See the result
### Logical Vectors
VN # This is a numeric vector
VN <= 3 # This is a logical vector (TRUE or FALSE)
VN[VN <= 3] # Crop the values minor or equal 3You may use ls() to see a list of all your objects, as in the “Environment” tab in RStudio. Also it is possible to remove an object with rm().
as.vector(), as.factor(), levels(), factor()
Categorical data can be in two different formats: character vectors or factors. To change between them use as.vector() or as.factor(). Factors structure is a numeric vector with the order and a character vector called names with the list of categories.
### Factors ---------------------------------------------
Vec <- c("Red","Blue","Red","Blue","Red") # This is a character vector
str(Vec) # Notice its structure
Fac <- as.factor(Vec) # Create factor
as.vector(Fac) # Vector
str(Fac) # Notice structure of a factor with levels and numbers
Fac[3] # Position 3
levels(Fac) # To see the levels.
# Notice that levels are in alphabetical order
levels(Fac)[2] # The second level of the factor
### Reorder Factors ----------------------------------------
levels(Fac)[c(2, 1)] # The levels ordered as wanted
FacR <- factor(Fac, levels = levels(Fac)[c(2, 1)]) # Redo the factor
str(FacR) # Factor ordered
### Other way to reorder factors
FacR2 <- factor(Vec, levels = c("Red", "Blue"))
str(FacR2) # Same resultMatrices are numeric data organized by rows and columns. In R, a matrix is actually a vector with an additional attribute (dim), which is itself a numeric vector with length 2, and defines the numbers of rows and columns of the matrix. A matrix can be created in different ways. The most used is with the function matrix().
The option byrow indicates whether the values given by data must fill successively the columns (the default) or the rows (if TRUE). The option dimnames allows to give names to the rows and columns.
### Matrices ----------------------------------------------------
### Create a matrix of 3 rows and 4 columns filled with 0s
matrix(data = 0, nrow = 3, ncol = 4) ## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
Another way to create a matrix is with the vector of values and giving values to the dim attribute:
### Create a matrix from a vector
Mdata <- 1:12 # Numeric vector
Mdata
dim(Mdata) # Null. Vectors do not have dimensions, only length
dim(Mdata) <- c(3, 4) # assign dimensions
Mdata # The matrix
######## To convert the matrix into the original vector
as.vector(Mdata)
######## Characteristics of a matrix
length(Mdata)
dim(Mdata)
str(Mdata) # int means integers (numeric without decimals)
######## To point into a place in the matrix
Mdata[3, 4] # 12
Mdata[2, 2] # 5seq(), rep(), data.frame(), names(), paste(), row.names()
Data frames are the way we will usually have the data.
A data frame is a rectangular set of data that, as matrices, has columns (variables) and rows (observations), but in this case, columns (variables) are vectors of the same length. Each column (variable) is a vectorthat can only have one type of data.
### Data frames ---------------------------------------
#### Creating different vectors or variables
V1 <- c(1:5)
V2 <- seq(1, 10, 2) # Sequence from 1 to 10 by 2
V3 <- # Replicate the vector to length 5
c("one", "two", "three", "four", "five")
V4 <- rep(c("odd", "even"),length = 5)
#### Creating the data frame with four variables
D15 <- data.frame(V1, V2, V3, V4)
D15## 'data.frame': 5 obs. of 4 variables:
## $ V1: int 1 2 3 4 5
## $ V2: num 1 3 5 7 9
## $ V3: chr "one" "two" "three" "four" ...
## $ V4: chr "odd" "even" "odd" "even" ...
## [1] 5 4
#### Different ways of accessing one variable from the data frame
D15$V1 # Fist variable
str(D15$V1) # Vector
str(D15$V3) # Factor
D15[3] # Still a data frame but with only one variable
D15[[3]] # Same factor than D15$V3
#### Accessing data into a data frame (several ways to do the same)
D15$V3[4] # Rows 4 of third variable (Still factor)
D15[[3]][4] # Same than before
D15[4, 3] # Same than matrices
as.character(D15$V3[4]) # Same, but as character
#### Variable names
names(D15) # Show variable names
PrevNames <- names(D15) # Save variable names to use them latter
NewNames <- c("Num", "OddNum", "Text", "OddEven")
### Change variable names using paste
names(D15) <- paste(PrevNames, NewNames, sep = "_")
names(D15)[1] <- "V1_Num15" # Change only one variable name
names(D15)
#### Row names
row.names(D15) <- D15$V3_Text
D15
#### Subset a data frame
D15[D15$V4_OddEven == "odd", ] # Subset of all rows with Var4="odd"
### Subset of all rows with Var4="odd" in variables 1 to 2.
D15[D15$V4_OddEven == "odd", 1:2] Into a data frame it is possible to put vectors and factors of the same length. Into a list it is possible to put almost anything, even other lists. There is no constraint on the objects that can be included. Several results objects exited from analyses will be lists.
#### Lists --------------------------------------------
#### Create a list with existing objects
# ls() # List all objects already active in R
### Create list with some of the previous objects of this chapter
L1 <- list(D15, Mdata, Fac, VN, UN)
L1 # A list with different objects## [[1]]
## V1_Num15 V2_OddNum V3_Text V4_OddEven
## one 1 1 one odd
## two 2 3 two even
## three 3 5 three odd
## four 4 7 four even
## five 5 9 five odd
##
## [[2]]
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## [[3]]
## [1] Red Blue Red Blue Red
## Levels: Blue Red
##
## [[4]]
## [1] 1 3 4 5
##
## [[5]]
## [1] "1" "3" "4" "5" "One" "Three" "Four" "Five"
## List of 5
## $ :'data.frame': 5 obs. of 4 variables:
## ..$ V1_Num15 : int [1:5] 1 2 3 4 5
## ..$ V2_OddNum : num [1:5] 1 3 5 7 9
## ..$ V3_Text : chr [1:5] "one" "two" "three" "four" ...
## ..$ V4_OddEven: chr [1:5] "odd" "even" "odd" "even" ...
## $ : int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
## $ : Factor w/ 2 levels "Blue","Red": 2 1 2 1 2
## $ : num [1:4] 1 3 4 5
## $ : chr [1:8] "1" "3" "4" "5" ...
Into the structure of L1 there is a data frame, a matrix, a factor, a numeric vector and a character vector. Names of the original variables are not kept, and usually are not needed.
To put names into a list, if needed, use names() same way that for data frames
To point to data or objects into a list use [[ ]] to get each object into the list and then it is the same than for each object type.
#### Create a list with names
L1 <- list(name.for.dataframe = D15,
name.for.matrix = Mdata,
name.for.factor = Fac,
name.for.numeric = VN,
name.for.character = UN)
### see names
names(L1)
#### Change names into a list
names(L1) <- c("D15", "Mdata", "Fac", "VN", "UN")
L1
str(L1)
##### Point to objects into a list.
L1[[1]] # The first object
L1$D15 # Same than previous, using the name
#### Point to data into objects into a list
L1[[1]]$V1_Num15 # First variable of the data frame
L1$D15$V1_Num15 # Same than previous using the name
L1[[1]][[1]] # Same than previous using only numbers.
# This notation is very useful when using iterations.From statistics or coding in R you will never know everything, neither should it be your objective.
What you need is to know how to look for the solution of the problems that will arise in each project. Therefore, learn how to search for help is probably the most important skill you will train in this course.
All commands in R have a help page. To see it use the Help tab in
RStudio. Other way is using help() or ?.
Try to always have a look to the help of all the commands you use. These
help pages have always the same structure, with the description of the
parameters of the command and some examples at the end.
Using help.start() opens the R internal HTML complete help.
R’s name is not great for searchs because it is a single letter and Google does not associate this letter with the R language. Your first few search attempts probably will confirm this.
To google a question about how to to something in R it is better to put “How to …. in R” than just put “R” alone.
If you know which package you want to use, using the name of the package will improve the search.
Other tip is to put “stackoverflow” in the question. This is a forum where people ask how to do things in different programming languages. There are different answers for a question. This usually is the best place to find a solution but I usually get better results from Google than directly from the forum.
Other nice forum is cross validated, the statistical side of stackoverflow.
You can also use the PoliformaT forum to ask questions or solve the problems from other students.
Before start the exercises, lets do some steps to start with the right workflow.
Doing always these steps at the beginning of a project, you will have a project folder with everything inside: data, scripts and results. It makes easy to share your work and to keep it in order.
When writing your first script, be sure you copy-paste the statements, always commented, and comment everything you do, so anybody that sees your script knows what are you doing (e.g., you next week or next year).
Create a vector with numbers from 1 to 31. With this vector and the command paste() create a vector named “tree.name” with 31 tree names, from “Tree_1” to “Tree_31”.
Make and object called “d0” with the data frame “trees” in R datasets. Look into the help to see what is into this data frame. ¿How many variables and observations there are in d0?
Add your variable tree.name to d0
Make the code to extract the name in tree.name of the larger (volume), the highest and the widest tree.
Using the function mean() calculate the mean Diameter, Height and Volume.
Make a new factor variable into d0 with two levels: “Large” for trees with volume larger or equal the mean and “Small” with trees with volume smaller than the mean. ¿How many “Large” trees are there?
Make a new factor variable into d0 with “Tall” for trees taller or equal the mean and “Short” with trees with Height shorter than the mean.
Make a subset with the trees that are both short and large and calculate the mean diameter of these Short-Large trees in meters. How many “Short_Large” trees are there?
Cite as: Alfonso Garmendia (2023) R for life sciences. Chapter 1: Easy R start. http://personales.upv.es/algarsal/R-tutorials/01_Tutorial-1_R-easy-start.html.
Available also in other formats (pdf, docx, …): https://drive.google.com/drive/folders/19w914WCg8BVTVBE_zpgShmg2vpjguV1e?usp=sharing.
Other simmilar tutorials: https://garmendia.blogs.upv.es/r-lecture-notes/
Originals are in bitbucket repository: https://bitbucket.org/alfonsogar/tea_daa_tutorials.
Document written in Rmarkdown, using Rstudio.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.