2 Tidy (spatial) data
In this session you should learn:
2.1 Installing packages
Last session, we mostly worked with base R. In this session we will be using dedicated packages that are particularly helpful for data science and spatial analysis.
Stable packages are usually installed from CRAN. The function install.packages()
takes a vector of names and a destination library, downloads the packages from the repositories and installs them.
You can also install packages not on CRAN or “development” versions of packages using the {remotes}
R package. Packages from GitHub, GitLab, Bitbucket as well as local packages are supported.
2.2 Tidy data
TIdy data is data organised in a particular, rectangular structure with one observation per row and one variable per column (Wickham, 2014).
Does this look familiar? Think about your standard GIS GUI.
But the data you find in the wild is not always tidy.
The tidyverse group of packages collects the main workbench of functions you can use to clean, wrangle and manipulate your data.
The ultimate goal of tidy workflows is that you can turn any untidy dataset into a tidy one and then be able to apply reproducible workflows.
2.2.1 Tidyverse packages
2.2.2 A typical data analysis workflow
2.3 Spatial data
Spatial data is special, you already know this. Coordinates, projections and transformations, geometries, vector data types, raster and gridded data: these are a sample of the characteristics spatial software has to take into account.
Spatial data in R has had a long history and evolution. Spatial packages were developed already from the time R’s predecessor, the S language, was around in the 1990s. Many package developments have taken place until getting to the current state of R-Spatial packages. We will take a look at the current package ecosystem next session.
2.3.1 The {sf}
package
Simple Features for R {sf}
(E. Pebesma, 2018) is currently the main R package to handle spatial data. Simple features are a formal OGC standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers.
Geometry types: points, lines, polygons, or their derivatives are represented by this OGC hierarchical data model. {sf}
supports the following geometry types:
{sf}
from Lovelace et al. (2019)The {sf}
package was designed to fit tidy data workflows. To do so, it keeps the philosophy of one row per observation, one column per variable. As such, the geometry of each observation is treated as a variable and is place in a geometry column.
But the geometry column in a sf
object is special in different ways:
- it is a “sticky” column, which means it is not easily dropped by any data operations you perform, e.g. when you select a column in a
sf
object, the geometry column will stay there. - it is a “list-column”, which means it is a nested column.
- it has its own class:
sfc
, which is a standalone class where methods forsfc
objects can be applied.
sfc
methods
library(sf)
methods(class = "sfc")
## [1] [ [<-
## [3] as.data.frame c
## [5] coerce format
## [7] fortify identify
## [9] initialize obj_sum
## [11] Ops points
## [13] print rep
## [15] scale_type show
## [17] slotsFromS3 st_area
## [19] st_as_binary st_as_grob
## [21] st_as_s2 st_as_sf
## [23] st_as_text st_bbox
## [25] st_boundary st_break_antimeridian
## [27] st_buffer st_cast
## [29] st_centroid st_collection_extract
## [31] st_concave_hull st_convex_hull
## [33] st_coordinates st_crop
## [35] st_crs st_crs<-
## [37] st_difference st_exterior_ring
## [39] st_geometry st_inscribed_circle
## [41] st_intersection st_intersects
## [43] st_is st_is_full
## [45] st_is_valid st_line_merge
## [47] st_m_range st_make_valid
## [49] st_minimum_bounding_circle st_minimum_rotated_rectangle
## [51] st_nearest_points st_node
## [53] st_normalize st_point_on_surface
## [55] st_polygonize st_precision
## [57] st_reverse st_sample
## [59] st_segmentize st_set_precision
## [61] st_shift_longitude st_simplify
## [63] st_snap st_sym_difference
## [65] st_transform st_triangulate
## [67] st_triangulate_constrained st_union
## [69] st_voronoi st_wrap_dateline
## [71] st_write st_z_range
## [73] st_zm str
## [75] summary text
## [77] type_sum vec_cast.sfc
## [79] vec_ptype2.sfc
## see '?methods' for accessing help and source code
2.3.2 Turning X/Y data into a sf
2.3.3 A customised map
Full code for a more customised map
library(tidyverse)
library(sf)
library(rnaturalearth)
# Read in the data
= read_csv("https://raw.githubusercontent.com/loreabad6/app-dev-gis/main/data/data_lesson2.csv")
data = data |>
data_sf_ec # Directly transform to sf
st_as_sf(coords = c("longitude_deg", "latitude_deg"), crs = 4326) |>
# Filter for my home country
filter(iso_country == "EC") |>
# Transform airport type to nice labels by capitalising the first letter and removing the snakecase
mutate(type = str_to_sentence(str_replace_all(type, "_", " "))) |>
# Relevel or reorder the types to have them in a more logical order
mutate(type = fct_relevel(type, "Large airport", "Medium airport",
"Small airport", "Heliport", "Closed"))
# Obtain Ecuador but also surrounding countries for context
= ne_countries(scale = 50, country = c("Colombia","Ecuador","Peru"))
countries # Extract Ecuador to obtain its bounding box and focus on it in coord_sf
= countries |> filter(sovereignt == "Ecuador")
ecuador = ecuador |>
ec_bbox # Transform before getting the bounding box to match the CRS in the plot
st_transform(24817) |> st_bbox()
ggplot() +
# add the country layer
geom_sf(data = countries, fill = "grey90", color = "white") +
# add the data, changed the shape to a dot with a fill and border color,
# assgined an alpha or opacity to not oclude the points
geom_sf(data = data_sf_ec, aes(fill = type, size = type),
shape = 21, alpha = 0.8) +
# Change the fill palette
scale_fill_brewer("Airport type", palette = "Dark2") +
# Manually assign point sizes
scale_size_manual("Airport type", values = c(6, 4, 2, 1, 0.25)) +
# Change the CRS to a projected one to avoid distorsions.
# Focus the map on Ecuador by using the bounding box extent
coord_sf(
crs = 24817,
xlim = c(ec_bbox["xmin"], ec_bbox["xmax"]),
ylim = c(ec_bbox["ymin"], ec_bbox["ymax"])
+
) # use a more minimal theme
theme_bw() +
# change the legend to the bottom
theme(legend.position = "bottom")
… you know, if you are bored.
You can keep playing with your map to get familiar with the ggplot package, customise it further, add scale and north arrows… your imagination (and maybe the ggplot extensions available) is the limit.
Another thing you can try is to create a function that automatically generates the map for a given country.
2.4 Further reading:
- Data import chapter (Wickham, Çetinkaya-Rundel, et al., 2023)
- Tidy data chapter (Wickham, Çetinkaya-Rundel, et al., 2023)
- History of R-Spatial section (Lovelace et al., 2019)
- Spatial data section, chapters 1-6 (E. Pebesma & Bivand, 2023)
- Geographic Data in R chapter (Lovelace et al., 2019)
- Simple Features for R vignette (E. Pebesma, 2025)