4  GIS: in-house and bridges

In this session you should learn:

Note

Save the code you work on today, we will use it next week. Most importantly, try to automate the workflow using a function!

Thank you for replying to the poll last week! Here are the results:

What spatial methods are you interested in?

What spatial methods are you interested in?

What applications are you interested in?

What applications are you interested in?

4.1 Dedicated R packages

There are several packages in R that are specialised in particular spatial analysis methods so that you can do any analysis in R directly without resorting to external software. This is by all means not an exhaustive list of all the spatial capabilities in R, but just a sample to get you started.

4.1.1 Earth observation analytics and remote sensing

Remote sensing and Earth observation has had a big boost recently, especially since the ability to work with data out-of-memory1. Several workflows have become available to work with EO data within R to harness the statistical and visualisation capabilities of the programming language. Further, packages that allow retreival of EO data from cloud services are also available.

I. Statistical modelling and machine learning

Several packages are available to work with raster data as we saw last session, including {stars} and {terra}. One of the main use cases of raster data is to perform machine learning or statistical modelling.

Resources available:

II. Cloud data and on-demand data cubes

With the advent of common standards and specifications such as the Spatio Temporal Asset Catalog (STAC), accessing data from the cloud has become much easier. Once data can be accessed in an standardised way, working with on-demand data cubes locally becomes a much easier process.

The package {rstac} provides the workforce to fetch STAC data. This package is the basis for other packages workflows to download data, to create data cubes, or to do it all at once.

Here you will find the package documentation.

Another interesting package is {rsi}, which stands for remote sensing indices. The package basically obtains data using {rstac} but also connects to the Awesome Spectral Indices catalogue to fetch standardised ways to compute popular spectral indices on your data.

Resources for {rsi}:

Finally, the creation of on-demand data cubes is taken care of in R using the {gdalcubes} package. With this package you can create data cubes from cloud resources or local files, apply data cube operations (apply functions over dimensions, filtering, computing statistics with reductions), extract time series, create training data for machine learning.

Resources for {gdalcubes}:

III. All-in-one solution

To concentrate all the efforts of fetching data from the cloud, creating on-demand data cubes, and performing statistical modelling or applying machine learning algorithms to the data, the {sits} package was created. It build upon the packages shown above and it can be seen as an all-in-one solution that simplifies the usage of satellite imagery time series. The main focus is of course on time series analysis, therefore taking the time component into account for the modelling.

A typical workflow in {sits} would look like this:

Using time series for land classification. Source: Câmara et al. (2023)

Resources for {sits}:

We will explore the {gdalcubes} package in the hands-on part of this section. To do so, we follow the example from the STAC across Languages.

  1. Install the packages.
install.packages("rstac")
install.packages("gdalcubes")
  1. Call the necessary libraries
library(rstac)
library(gdalcubes)
library(sf)
library(tidyverse)
  1. Set STAC collection
api_url = "https://earth-search.aws.element84.com/v1"
client = stac(api_url)
collections(client) |> get_request()
  1. Define query variables
# collection ID
collection = 'sentinel-2-l2a'
# coordinates
lon = # add your longitude coordinate of interest
lat = # add your latitude coordinate of interest
point = st_point(c(lon, lat))
# date range
time_extent = c("1970-05-01", "1980-10-01") # change times to your interest
cloud_cover_min = 40
  1. Fetch the data
items = client  |> 
  stac_search(
      collections = collection,
      intersects = point, 
      datetime = paste0(time_extent,"T00:00:00Z", collapse = "/"),
      limit = 100
    ) |> 
  post_request() |> 
  items_filter(
    filter_fn = function(x) {x$properties$`eo:cloud_cover` < cloud_cover_min}
  )
items_length(items)
  1. View items as an sf
# we add an extra column to be able to subset the original item list
items_df = items_as_sf(items) |> 
  mutate(fid = row_number())
# wrap function around st_as_sf and as_tibble just for nicer printing
st_as_sf(as_tibble(items_df))
  1. Set up a bounding box around lat/long coordinates
bbox = point |> 
  st_sfc(crs = 4326) |> 
  st_buffer(5000, endCapStyle = "SQUARE") |> 
  st_transform(unique(items_df$`proj:epsg`)) |> 
  st_bbox()
  1. Create a data cube
assets = c("red","green","blue","nir","scl")
col = stac_image_collection(items$features, asset_names = assets)
extent = list(
  t0 = time_extent[1], t1 = time_extent[2],
  left = bbox["xmin"], bottom = bbox["ymin"], 
  right = bbox["xmax"], top = bbox["ymax"]
)
view = cube_view(
  srs = paste0("epsg:", unique(items_df$`proj:epsg`)),   
  extent = extent,
  dx = 100, dy = 100, dt = "P1D",
  aggregation = "median",
  resampling = "bilinear"
)
# Create a cloud mask
s2_mask = image_mask("scl", values = c(3,8,9))
cube = raster_cube(col, view, mask = s2_mask)
cube
  1. Compute the NDVI
cube_ndvi = raster_cube(col, view) |> 
  select_bands(c("red","nir")) |> 
  apply_pixel(c("(nir-red)/(nir+red)"), names="NDVI") 
  1. Visualise the data
gdalcubes_options(parallel = 4, cache = TRUE) 
## Perform an RGB composition
rgb = cube |> 
    select_bands(c("blue","green","red")) |> 
    reduce_time(c("median(blue)", "median(green)", "median(red)")) 
rgb |> 
    plot(rgb = 3:1) 

## Visualise monthly NDVI values
cube_ndvi |> 
  select_bands("NDVI") |> 
  aggregate_time("P1M", "mean") |> 
  plot()

EXTRA: turn this workflow into a function that takes as parameters the lat, long and time inputs. It should output either the RGB plot or the NDVI plots depending on another parameter given by the user.

4.1.2 Spatial statistics

R is well know for its statistical capabilities, and spatial statistics is no exception. Several packages allow for full statistical analysis workflows like:

  1. point pattern analysis using {spatstat}
  2. interpolation and geostatistics with {gstat}
  3. autocorrelation and spatial dependence with {spdep} and {rgeoda}
  4. … many others (see further reading)

Here we will perform some exploratory data analysis as shown in the documentation of the {rgeoda} package. This is based on the tutorial on the {rgeoda} package website.

  1. Install packages
install.packages("rgeoda")
install.packages("patchwork")
install.packages("sfdep")
  1. Load packages
library(sfdep)
library(rgeoda)
library(sf)
library(tidyverse)
library(patchwork)
  1. View the data to analyse
plot(guerry)
  1. Compute local indicators of spatial association (LISA) for the variable “Population per Crime against persons”
queen_w = queen_weights(guerry)
lisa = rgeoda::local_moran(queen_w,  guerry['crime_pers'])
  1. Compute LISA clusters and p-values - Locan Moran
guerry_lisa = guerry |> 
  mutate(
    lisa_clusters = as.factor(lisa_clusters(lisa)),
    lisa_p = lisa_pvalues(lisa)
  )
  1. Create a Local Moran and significance map
## Local Moran
lisa_colors = lisa_colors(lisa)
lisa_labels = lisa_labels(lisa)

moran = ggplot(guerry_lisa) +
  geom_sf(aes(fill = lisa_clusters)) +
  scale_fill_manual("", values = lisa_colors, labels = lisa_labels) +
  ggtitle(glue::glue("Local Moran Map of crime_pers")) +
  theme_void()

## Significance
p_labels = rev(c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001"))
p_colors = rev(c("#eeeeee", "#84f576", "#53c53c", "#348124"))
p_th = c(0, 0.001, 0.01, 0.05, 1)

sign = ggplot(guerry_lisa) +
  geom_sf(aes(fill = cut(lisa_p, p_th))) +
  scale_fill_manual("", values = p_colors, labels = p_labels) +
  ggtitle(glue::glue("Significance Map of crime_pers")) +
  theme_void()

moran + sign

EXTRA: turn this workflow into a function that accepts the variable to compute the LISA analysis as input and outputs the final plots.

4.1.3 Spatial network analysis

Working with spatial networks in R used to be a bit of a difficult taks before the package {sfnetworks} arrived. The aim of the package is to bridge between {sf} and {tidygraph} a tidy wrapper of the C++ {igraph} library which is used very often for network analysis.

With {sfnetworks} you can perform several tasks such as:

  • creating networks from points and lines
  • network cleaning
  • finding shortest paths
  • route optimisation
  • closest facility analysis
  • creating catchment areas
  • zoning networks based on their connectivity

It started as an assignment for an R course in an effort to find an open source solution to the ArcGIS network analysis capabilities and is now downloaded over 2 thousand times a month!

I recommend you to work with the development version (here are the docs).

We will create a shortest facility analysis workflow with {sfnetworks}. This is mostly based on the package documentation, specially the vignette: Routing on spatial networks

  1. Install packages
remotes::install_github("luukvdmeer/sfnetworks")
install.packages("tidygraph")
install.packages("ggraph")
install.packages("dodgr")
  1. Call the libraries
library(sf)
library(sfnetworks)
library(tidygraph)
library(tidyverse)
library(ggraph)
library(dodgr)
  1. Create an undirected spatial network and visualise it
sfn = as_sfnetwork(roxel, directed = FALSE) |> 
  st_transform(3035)
plot(sfn)
  1. Create random facilities and sites, map them to network indices
set.seed(9284)
bbox = st_convex_hull(st_combine(sfn))
rdm_fac = st_sample(bbox, 5) 
## assign node values for plotting
rdm_fac_nodes = rdm_fac |> 
  st_as_sf() |> 
  mutate(from = st_nearest_feature(rdm_fac, st_as_sf(sfn, "nodes")))
rdm_sites = st_sample(bbox, 50)
  1. Compute shortest paths between facilities and their closest sites
paths = st_network_paths(
  sfn,
  from = rdm_fac,
  to = rdm_sites,
  # use dodgr for many to many calculations
  router = "dodgr"
) |> 
  group_by(to) |> 
  filter(cost == min(cost))
  1. Visualise the shortest trip to the closes facility
ggraph(sfn, layout = "sf") +
  geom_edge_sf(color = "grey") +
  geom_node_sf(color = "grey") +
  geom_sf(
    data = paths,
    aes(color = as.factor(from)),
    linewidth = 1,
    show.legend = FALSE
  ) +
  geom_sf(data = rdm_sites, color = "black", size = 1) +
  geom_sf(
    data = rdm_fac_nodes, 
    aes(fill = as.factor(from)),
    size = 3,
    pch = 21,
    show.legend = FALSE
  ) +
  theme_void()

EXTRA: turn this workflow into a function that receives the facilities, the sites and the network as inputs and outputs the final plot.

4.2 Bridges to external software

Of course, you can also connect to external GIS software to perform operations in R. Why would you want to do that? One good reason is to automate your workflows. Instead of following a click, click, click sequence, you can have your process in a coded written form that allows you to remember what you did as it is written and documented, but also perform the same process over and over via automation.

4.2.1 QGIS

One of the best bridges to GIS with R out there is the {qgisprocess} package, which wraps around the “qgis process” command line. There is also a {qgis} package with the same idea but does a function per algorithm. It does not only give you access to native QGIS processing tools but also to any other add-on you have made available via your QGIS installation including plug-ins. That means, GRASS GIS, GDAL, SAGA are fully available to you via inside R with one package!

Resources to bridge with QGIS

Naturally, for this to work, you will need a QGIS installation. The package will try to find your installation automatically on your first run. However, you can also pass a custom path if you have multiple installations.

Depending on your QGIS configuration, you will have access to different providers. You can look at what is available to you with qgis_providers().

4.2.2 SAGA GIS

You also have individual bridges to GIS software naturally. If you are interested in geomorphology and hydrology, you might be very familiar with SAGA GIS. You can access SAGA algorithms via QGIS, but if you don’t have that linked or if you rather use a package with less dependencies, then {Rsagacmd} is for you. There is also the {RSAGA} package, however, this does not have so good documentation, and its development seems a bit paused.

Resources for SAGA GIS bridges with R

Again, you will need to have a SAGA GIS installation or point the package to the installation path of the SAGA GIS version you would like to use.

4.2.3 GRASS GIS

Does anyone know what GRASS2 stands for?

GRASS GIS has been around since the 80s and is one of the most powerful GIS out there. It has been ahead of its time for so long and it keeps up-to-date with current GIS trends. Its community of developers and users is also thriving, even having annual workshops that allow the further development of the tools supported.

It is a core part of the OSGeo suite of sofware, available in loads of languages, has really good cloud-support and as one of the core members, Vero Andreo always likes to say: it rocks!

Resources to work with GRASS GIS

Naturally, you will need an installation of GRASS to be able to run the algorithms with this package.

4.2.4 Google Earth Engine (GEE)

Bindings to GEE are available via the {rgee} package. The efforts towards this connection were driven by a former student here at PLUS! It does not only have now the connection to GEE but also functions that build on top of it and a book with good documentation.

GEE binding

Extra functions

Book
Figure 4.1: RGEE package bundle

This bridge of course requires you to have a Google Earth Engine account that will allow you to authenticate. The documentation of the package has all the required steps to get started.

4.2.5 R-ArcGIS Bridge

ESRI has also made efforts to connect to open source software. You already know how to connect ArcGIS to Python, using the arcpy interface. For R, ESRI has created a bundle of packages that, when you have ArcGIS Pro and a valid license, you can use to run their geoprocessing tools.

Here you can find the documentation to install and quick start with these packages.

Disclaimer

I will not support your use of proprietary software in this course, but if you are interested and even if you would like to use it in your project, feel free to do so.

4.3 Hands-on

README

Pick one of the “💻 Code at Work: Your Turn to Execute” sections you are most interested in and work through the code.

Try to understand what the functions are doing, read the docs and get familiar with the packages and workflows.

There is an extra task at the end of each to automate the workflow by wrapping it in a function. Make sure to do this step, as we will use the function you create next class.

Exploring other packages

If you feel like exploring other packages outside of the samples I gave or one of the bridges to external GIS software, feel free to do so. Use the materials provided under each section!

I recommend to play with the bridges if you already have the software installed and you are familiar with it, otherwise it might take too long for today’s class. You would also need to have a workflow inside a function by the end of the class.

You don’t need to submit today’s work, but save it for next class!

4.4 Further reading:


  1. R has limited memory issues and working with large raster datasets used to be difficult.↩︎

  2. Geographic Resources Analysis Support System↩︎