1 Fundamentals of R
1.1 Are R and Python really that different?
Different to compiled language, like C or JAVA, R’s default implementation is an interpreted language. R is a dynamic programming language, which means R automatically interprets your code as you run it. Python is tipically first compiled and then interpreted, but the compilation is hidden from the user.
R packages are organized via The Comprehensive R Archive Network (CRAN). Mature R packages are submitted to CRAN where checks are performed on their backward compatibility against other packages and R versions and operating systems. There is no CRAN for Python packages, which sometimes makes it harder to handle versioning of the packages you work with.
CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R.
R uses a particular assign operator:
<-
. However, you will see code and in particular in this lessons using:=
. You can use them interchangeably and this won’t affect your code. So if you are used to Python, you can stick to=
.R has different coding paradigms, most importantly base or vanilla R, tidyverse and data.table.
Piping: if you are familiar with the R tidyverse, you might have seen the use of the
%>%
operator to concatenate functions. Since R 4.1.0 R introduced a native pipe operator|>
and that is what we use in this course. If you are curious about how the native pipe works and differences with themagrittr
pipe, see here and here.Code syntax is of course different. For instance, Python indexes from 0 and R from 1. Unfold the note below for cheatsheet, based on Watson (n.d.) of the main peculiarities.
# Packages
library(dplyr)
# Strings
paste('Hello', 'World')
paste(c('Hello', 'World'), collapse = '')
# Booleans
TRUE && FALSE == FALSE
FALSE || TRUE == TRUE
!TRUE == FALSE
# Loops
for (i in 1:10) {
print(i)
}while (x > 0) {
= x - 1
x
}
# Conditionals
if (x > 0) {
print('x is positive')
else if (x == 0) {
} print('x is zero')
else {
} print('x is negative')
}
ifelse(x>0, 1, -1)
# Functions
= function(x,y) {
f = x * x
x2 + sqrt(y*x2+1)
x2
}^2 + sqrt(y*x^2+1)}()
{\(x) x
# Lists
= list(1, 2, "a", c(10,8,9))
myList 3] == "a"
myList[4]][2] == 8
myList[[length(myList)] # returns list(10,8,9)
myList[2 %in% myList
# Ranges
seq(0, 2*pi, by = 0.1)
seq(0, 2*pi, length = 100)
0:5 == c(0, 1, 2, 3, 4, 5)
# Vectors and Matrices
= matrix(c(1,3,2,4),nrow=2) # column-wise!
A = c(1,2)
b t(A)
dim(A)
solve(A,b)
> 0 # elementwise comparison
b ^2 # elementwise product
A%*% A # matrix product
A which(b > 0)
matrix(rep(2,100), nrow=10)
diag(4)
cbind(A,b)
rbind(A,b)
# Random numbers
set.seed(1234)
matrix(runif(100),nrow=10)
rnorm(10)
sample(10:99,1)
# Plot
plot(runif(100))
# Packages
import pandas as pd
# Strings
'Hello' + 'World'
','.join(['Hello', 'World'])
# Booleans
True and False == False
False or True == True
not True == False
# Loops
for i in range(1,11):
print(i)
while x > 0:
-= 1
x
# Conditionals
if x > 0:
print('x is positive')
elif x == 0:
print('x is zero')
else:
print('x is negative')
1 if x > 0 else -1
# Functions
def f(x,y):
= x * x
x2 return x2 + (y*x2+1)**(1/2)
lambda x: x**2 + (y*x**2+1)**(1/2)
# Lists
= [1, 2, "a", [10,8,9]]
myList 2] == "a"
myList[3][2] == 9
myList[-1] == [10, 8, 9]
myList[2 in myList
# Ranges
import numpy as np
0, 2*np.pi, step=0.1)
np.arrange(0, 2*np.pi, num=100)
np.linespace(list(range(5)) == [0,1,2,3,4]
# Vectors and Matrices
= np.array([[1, 2], [3, 4]])
A = np.array([1, 2])
b # or A.T
np.transpose(A)
A.shape
np.linalg.solve(A, b)> 0 # elementwise comparison
b **2 # elementwise function application
b@ A # matrix product
A > 0)
np.where(b 10,10), 2)
np.full((4) # 4 x 4 identity matrix
np.eye(
np.hstack((A,b[:,np.newaxis]))
np.vstack((A,b))
# Random numbers
1234)
np.random.seed(10,10)
np.random.rand(10)
np.random.randn(10,100)
np.random.randint(
# Plot
import matplotlib.pyplot as plt
0, 1, 100)) plt.plot(np.random.uniform(
1.2 Base or “vanilla” R
On the note above you find several examples of code syntax for base or “vanilla” R. R has several coding syntax. Knowing the basics of base R allows you to write R code without having to depend on other packages. However, most data science workflows are facilitated by other coding syntaxes such as tidyverse
and data.table
. In this lesson, you will be solving a practical using base R (Practical 2). In coming lessons, we will include the tidyverse into our workflows.
Find some info on coding basics right here.
1.3 R functions
Functions let you automate tasks and make your code more organized. If you find yourself repeating a piece of code over and over and just changing one parameter, then you can probably replace that workflow with a function.
Reasons to create a function, as explained in Wickham, Çetinkaya-Rundel, et al. (2023):
- When requirements change, you only update code once.
- Eliminate errors from copy-pasting. e.g. you won’t forget to update a variable name in all the places you use it.
- Organized code: you can name your function something intuitive to remind you of the task you are undertaking.
- Reuse workflows between projects, making you more efficient.
An R function has three elements:
= function(arguments) {
name
body }
- name
- arguments: elements that vary across calls
- body: code that is repeated across calls
You will find fundamentals of calling functions in R right here.
Practical 3 is about writing functions in R. There you will practice this function syntax.
1.4 Quarto crash course
In this course, we will be using Quarto documents for the practicals. This website itself is created using Quarto.
- Quarto is an open-source scientific and technical publishing system.
- It allows you to combine code, results and text in a single document.
- It uses Markdown syntax.
- With Quarto, you can create reproducible documents in several output formats like PDF, HTML, Word, presentations with Reveal.js, etc.
- It has native support for multiple programming languages like Python and Julia in addition to R.
- It can also render Jupyter notebooks.
1.4.1 Anatomy of a Quarto file
A plain text file that has the extension .qmd is a Quarto file:
---
title: I am a Quarto file
date: today
format: html
---
In this document, we will load a *spatial dataset* with the `sf` package.
```{r}
#| label: setup
#| message: false
library(sf)
nc = read_sf(system.file("shape/nc.shp", package="sf"))
```
We can see how the `nc` object looks like:
```{r}
nc
```
And we can also **plot** it:
```{r}
#| label: plot
#| echo: false
plot(nc['AREA'])
```
You will notice three basic components of the file:
- Metadata: YAML
“Yet Another Markup Language” or “YAML Ain’t Markup Language” is used to provide metadata. Depending on the type of document you are authoring, several parameters are available. Parameters can be nested and therefore indentation is important. It is kept between ---
in the form:
---
key: value
---
- Text: Markdown
Text is done with Markdown syntax. If you are not familiar with this, here are some basics: Text section of Intro to Quarto by Charlotte Wickham.
- Code: R executed via
knitr
Executable code is contained in chunks surrounded by ```
.
The more you get familiar with Quarto documents, the more you will be able to do. Interactive documents, including dashboards are also supported by Quarto. Your final project should be reported using Quarto.