library(dplyr) # data wrangling
library(gdalcubes) # on-demand data cubes
library(ggplot2) # plotting
library(here) # resolve relative paths
library(knitr) # visualize URLs
library(osmdata) # retrieving OSM data, AOI bounding box
library(rstac) # connecting to the STAC API
library(sf) # handle geospatial data frames
library(stars) # handle spatio-temporal arrays
library(tmap) # visualization
Sentinel-2 data in R
OpenGeoHub Summer School 2023
Accessing data via STAC APIs
The libraries we will use for this exercise are listed below.
Defining an extent
We will obtain our area of interest with the osmdata
package.
## 405 error last time tested
= getbb("poznan poland", format_out = "sf_polygon", limit = 1) aoi
## backup geojson
= read_sf(here("data/poznan.geojson")) aoi
We can make sure the AOI is correct (run cell on source .qmd)
tmap_mode("view")
tmap mode set to interactive viewing
qtm(aoi, fill = 'orange', fill.alpha = 0.3)
And lastly the time extent:
= c("2022-04-01", "2022-10-01") time_extent
We compute the bounding box both in EPSG:4326 and EPSG:3857.
bb4326 = st_bbox(aoi)) (
xmin ymin xmax ymax
16.75738 52.29364 17.16362 52.53565
bb3857 = aoi |>
(st_transform(3857) |>
st_bbox())
xmin ymin xmax ymax
1865423 6853395 1910645 6897563
Querying data with rstac
Similarly to the pystac-client
in Python, with rstac
we define our STAC API URL, from where we can query the collections available.
Let’s stay with the earth-search API.
= stac("https://earth-search.aws.element84.com/v0")
s collections(s) |> get_request()
###STACCollectionList
- collections (4 item(s)):
- sentinel-s2-l2a
- sentinel-s2-l1c
- sentinel-s2-l2a-cogs
- landsat-8-l1-c1
- field(s): collections, links
To search for the data we will refer to the parameters we defined during the extent definition.
= s |>
items stac_search(
collections = "sentinel-s2-l2a-cogs",
bbox = c(
"xmin"], bb4326["ymin"],
bb4326["xmax"], bb4326["ymax"]
bb4326[
), datetime = paste0(time_extent, collapse = "/"),
limit = 500
|>
) post_request()
items
###STACItemCollection
- matched feature(s): 144
- features (144 item(s) / 0 not fetched):
- S2B_33UXT_20220930_0_L2A
- S2B_33UXU_20220930_0_L2A
- S2B_33UXT_20220927_0_L2A
- S2B_33UXU_20220927_0_L2A
- S2A_33UXT_20220925_0_L2A
- S2A_33UXU_20220925_0_L2A
- S2A_33UXT_20220922_0_L2A
- S2A_33UXU_20220922_0_L2A
- S2B_33UXT_20220920_0_L2A
- S2B_33UXU_20220920_0_L2A
- ... with 134 more feature(s).
- assets:
AOT, B01, B02, B03, B04, B05, B06, B07, B08, B09, B11, B12, B8A, info, metadata, overview, SCL, thumbnail, visual, WVP
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
Can you spot the difference between the search exercise with Python and R?
We can explore the items as sf
objects:
i = items_as_sf(items)) (
Simple feature collection with 144 features and 19 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 16.43614 ymin: 51.32453 xmax: 18.14175 ymax: 53.2402
Geodetic CRS: WGS 84
First 10 features:
sentinel:sequence sentinel:latitude_band sentinel:data_coverage
1 0 U 12.48
2 0 U 39.62
3 0 U 100.00
4 0 U 100.00
5 0 U 13.01
6 0 U 40.19
7 0 U 100.00
8 0 U 100.00
9 0 U 12.47
10 0 U 39.61
geometry sentinel:grid_square sentinel:utm_zone
1 POLYGON ((16.43889 51.4423,... XT 33
2 POLYGON ((16.86367 52.24928... XU 33
3 POLYGON ((18.01124 51.32453... XT 33
4 POLYGON ((18.07166 52.22258... XU 33
5 POLYGON ((16.43826 51.42235... XT 33
6 POLYGON ((16.87243 52.24914... XU 33
7 POLYGON ((18.01124 51.32453... XT 33
8 POLYGON ((18.07166 52.22258... XU 33
9 POLYGON ((16.43891 51.44284... XT 33
10 POLYGON ((16.86331 52.24929... XU 33
sentinel:product_id proj:epsg
1 S2B_MSIL2A_20220930T100729_N0400_R022_T33UXT_20220930T130034 32633
2 S2B_MSIL2A_20220930T100729_N0400_R022_T33UXU_20220930T130034 32633
3 S2B_MSIL2A_20220927T100029_N0400_R122_T33UXT_20220927T124321 32633
4 S2B_MSIL2A_20220927T100029_N0400_R122_T33UXU_20220927T124321 32633
5 S2A_MSIL2A_20220925T101031_N0400_R022_T33UXT_20220925T162525 32633
6 S2A_MSIL2A_20220925T101031_N0400_R022_T33UXU_20220925T162525 32633
7 S2A_MSIL2A_20220922T100031_N0400_R122_T33UXT_20220922T174559 32633
8 S2A_MSIL2A_20220922T100031_N0400_R122_T33UXU_20220922T174559 32633
9 S2B_MSIL2A_20220920T100649_N0400_R022_T33UXT_20220921T213031 32633
10 S2B_MSIL2A_20220920T100649_N0400_R022_T33UXU_20220921T213031 32633
platform view:off_nadir gsd instruments constellation
1 sentinel-2b 0 10 msi sentinel-2
2 sentinel-2b 0 10 msi sentinel-2
3 sentinel-2b 0 10 msi sentinel-2
4 sentinel-2b 0 10 msi sentinel-2
5 sentinel-2a 0 10 msi sentinel-2
6 sentinel-2a 0 10 msi sentinel-2
7 sentinel-2a 0 10 msi sentinel-2
8 sentinel-2a 0 10 msi sentinel-2
9 sentinel-2b 0 10 msi sentinel-2
10 sentinel-2b 0 10 msi sentinel-2
datetime eo:cloud_cover sentinel:valid_cloud_cover
1 2022-09-30T10:16:11Z 87.39 1
2 2022-09-30T10:15:59Z 37.93 1
3 2022-09-27T10:06:18Z 100.00 1
4 2022-09-27T10:06:03Z 93.24 1
5 2022-09-25T10:16:19Z 19.33 1
6 2022-09-25T10:16:07Z 69.90 1
7 2022-09-22T10:06:25Z 22.14 1
8 2022-09-22T10:06:10Z 31.74 1
9 2022-09-20T10:16:13Z 62.14 1
10 2022-09-20T10:16:01Z 33.71 1
sentinel:boa_offset_applied sentinel:processing_baseline
1 TRUE 04.00
2 TRUE 04.00
3 TRUE 04.00
4 TRUE 04.00
5 TRUE 04.00
6 TRUE 04.00
7 TRUE 04.00
8 TRUE 04.00
9 TRUE 04.00
10 TRUE 04.00
created updated
1 2022-09-30T17:59:19.570Z 2022-09-30T17:59:19.570Z
2 2022-09-30T18:05:38.516Z 2022-09-30T18:05:38.516Z
3 2022-09-28T07:06:42.603Z 2022-09-28T07:06:42.603Z
4 2022-09-28T07:38:48.100Z 2022-09-28T07:38:48.100Z
5 2022-09-25T19:09:22.594Z 2022-09-25T19:09:22.594Z
6 2022-09-25T20:03:57.834Z 2022-09-25T20:03:57.834Z
7 2022-09-22T20:30:42.817Z 2022-09-22T20:30:42.817Z
8 2022-09-22T20:29:38.768Z 2022-09-22T20:29:38.768Z
9 2022-09-22T21:42:12.557Z 2022-09-22T21:42:12.557Z
10 2022-09-22T21:12:58.018Z 2022-09-22T21:12:58.018Z
glimpse(i)
Rows: 144
Columns: 20
$ `sentinel:sequence` <chr> "0", "0", "0", "0", "0", "0", "0", "0",…
$ `sentinel:latitude_band` <chr> "U", "U", "U", "U", "U", "U", "U", "U",…
$ `sentinel:data_coverage` <dbl> 12.48, 39.62, 100.00, 100.00, 13.01, 40…
$ geometry <POLYGON [°]> POLYGON ((16.43889 51.4423,...,…
$ `sentinel:grid_square` <chr> "XT", "XU", "XT", "XU", "XT", "XU", "XT…
$ `sentinel:utm_zone` <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,…
$ `sentinel:product_id` <chr> "S2B_MSIL2A_20220930T100729_N0400_R022_…
$ `proj:epsg` <dbl> 32633, 32633, 32633, 32633, 32633, 3263…
$ platform <chr> "sentinel-2b", "sentinel-2b", "sentinel…
$ `view:off_nadir` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ gsd <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ instruments <chr> "msi", "msi", "msi", "msi", "msi", "msi…
$ constellation <chr> "sentinel-2", "sentinel-2", "sentinel-2…
$ datetime <chr> "2022-09-30T10:16:11Z", "2022-09-30T10:…
$ `eo:cloud_cover` <dbl> 87.39, 37.93, 100.00, 93.24, 19.33, 69.…
$ `sentinel:valid_cloud_cover` <chr> "1", "1", "1", "1", "1", "1", "1", "1",…
$ `sentinel:boa_offset_applied` <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
$ `sentinel:processing_baseline` <chr> "04.00", "04.00", "04.00", "04.00", "04…
$ created <chr> "2022-09-30T17:59:19.570Z", "2022-09-30…
$ updated <chr> "2022-09-30T17:59:19.570Z", "2022-09-30…
And easily view the cloud cover for the area:
ggplot(i) +
aes(x = as.Date(datetime), y = `eo:cloud_cover`) +
geom_point() +
geom_line(group = 1) +
scale_x_date(date_breaks = "2 months")
The item properties can prove useful to filter the data collection we just obtained.
Let’s take a look at the properties present.
$features[[1]]$properties items
$datetime
[1] "2022-09-30T10:16:11Z"
$platform
[1] "sentinel-2b"
$constellation
[1] "sentinel-2"
$instruments
[1] "msi"
$gsd
[1] 10
$`view:off_nadir`
[1] 0
$`proj:epsg`
[1] 32633
$`sentinel:utm_zone`
[1] 33
$`sentinel:latitude_band`
[1] "U"
$`sentinel:grid_square`
[1] "XT"
$`sentinel:sequence`
[1] "0"
$`sentinel:product_id`
[1] "S2B_MSIL2A_20220930T100729_N0400_R022_T33UXT_20220930T130034"
$`sentinel:data_coverage`
[1] 12.48
$`eo:cloud_cover`
[1] 87.39
$`sentinel:valid_cloud_cover`
[1] TRUE
$`sentinel:processing_baseline`
[1] "04.00"
$`sentinel:boa_offset_applied`
[1] TRUE
$created
[1] "2022-09-30T17:59:19.570Z"
$updated
[1] "2022-09-30T17:59:19.570Z"
We already explored the eo:cloud_cover
property, but there are other properties that might turn out useful, e.g. sentinel:valid_cloud_cover
and sentinel:data_coverage
.
We can filter the sf object we created before for this values and select the first item from our result.
= i |>
ids mutate(fid = row_number()) |>
filter(
`sentinel:valid_cloud_cover` == 1,
`sentinel:data_coverage` >= 80,
`eo:cloud_cover` == min(`eo:cloud_cover`)
|>
) pull(fid)
= items$features[[ids[1]]] item
We can take a look at a preview of one item by getting the URL of the thumbnail asset. Note that rstac
has a preview_plot
function but it does not accept JPG formats yet.
|>
item assets_url(asset_names = "thumbnail") |>
include_graphics()
Creating a STAC data cube
Once we have made our query and fetched the data, we can create an on-demand data cube with gdalcubes
.
We can filter the times collection with the property_filter
parameter. As we saw before we will keep only scenes with valid cloud cover values below 10%.
= c(
assets "B01","B02","B03","B04","B05","B06",
"B07","B08","B8A","B09","B11","SCL"
)= stac_image_collection(
col $features,
itemsasset_names = assets,
property_filter = function(x) {x[["eo:cloud_cover"]] < 10 & x[['sentinel:valid_cloud_cover']]}
)
Warning in stac_image_collection(items$features, asset_names = assets,
property_filter = function(x) {: STAC asset with name 'SCL' does not include
eo:bands metadata and will be considered as a single band source
col
Image collection object, referencing 28 images with 12 bands
Images:
name left top bottom right
1 S2A_33UXU_20220905_0_L2A 16.46503 53.24020 52.24938 17.38807
2 S2A_33UXT_20220902_0_L2A 16.43614 52.34135 51.32453 18.07774
3 S2A_33UXU_20220902_0_L2A 16.46499 53.24020 52.22257 18.14175
4 S2A_33UXT_20220826_0_L2A 16.43919 52.34135 51.45200 16.90558
5 S2A_33UXU_20220826_0_L2A 16.46503 53.24020 52.24934 17.39166
6 S2A_33UXT_20220816_0_L2A 16.43906 52.34135 51.44769 16.90734
datetime srs
1 2022-09-05T10:16:11 EPSG:32633
2 2022-09-02T10:06:30 EPSG:32633
3 2022-09-02T10:06:15 EPSG:32633
4 2022-08-26T10:16:24 EPSG:32633
5 2022-08-26T10:16:12 EPSG:32633
6 2022-08-16T10:16:24 EPSG:32633
[ omitted 22 images ]
Bands:
name offset scale unit nodata image_count
1 B01 0 1 28
2 B02 0 1 28
3 B03 0 1 28
4 B04 0 1 28
5 B05 0 1 28
6 B06 0 1 28
7 B07 0 1 28
8 B08 0 1 28
9 B09 0 1 28
10 B11 0 1 28
11 B8A 0 1 28
12 SCL 0 1 28
Visualizing the data
To view the data, we create a cube_view()
object considering the AOI defined before but with EPSG:3857.
Arguments dx
and dy
define the spatial resolution of our output, while dt
corresponds to the temporal resolution.
= cube_view(
v srs = "EPSG:3857",
extent = list(t0 = time_extent[1],
t1 = time_extent[2],
left = bb3857["xmin"],
bottom = bb3857["ymin"],
right = bb3857["xmax"],
top = bb3857["ymax"]),
dx = 200, dy = 200, dt = "P1D",
aggregation = "median",
resampling = "average"
) v
A data cube view object
Dimensions:
low high count pixel_size
t 2022-04-01 2022-10-01 184 P1D
y 6853379.09435715 6897579.09435715 221 200
x 1865334.30546292 1910734.30546292 227 200
SRS: "EPSG:3857"
Temporal aggregation method: "median"
Spatial resampling method: "average"
How would you define a biweekly temporal interval? Hint ?cube_view()
.
Cloud masking
A cloud mask can also be defined. This will be based on the SCL values.
For that let’s first expand on SCL, which is the result of the Sentinel-2 scene classification.
We will use the stars
package to visualize the layer. But first, let’s select an item with moderate cloud cover to have an idea of the values present.
= i |>
ids mutate(fid = row_number()) |>
filter(
`sentinel:valid_cloud_cover` == 1,
`sentinel:data_coverage` >= 90,
`eo:cloud_cover` <= 50
|>
) pull(fid)
= items$features[[ids[1]]]
item
= item |>
scl_ex assets_url(asset_names = "SCL", append_gdalvsi = TRUE) |>
read_stars(RasterIO = list(nBufXSize = 512, nBufYSize = 512))
In the data folder of this repo I added a CSV file with the SCL classes that we will use for plotting.
= read.csv("../../data/SCL_classes.csv")
scl = scl |>
scl mutate(label = paste(code, desc, sep = ": "))
= scl$color
scl_pal names(scl_pal) = scl$label
= scl_ex |>
scl_ex_factor mutate(
SCL.tif = factor(SCL.tif, levels = scl$code, labels = scl$label)
)ggplot() +
geom_stars(data = scl_ex_factor) +
scale_fill_manual(values = scl_pal) +
theme_void()
We can then see that masking out clouds would require to reference classes 3, 8 and 9.
# clouds and cloud shadows
= image_mask("SCL", values=c(3,8,9)) S2.mask
Finally, we can visualize a median composite of our collection. This particular chunk will take a while since the data is fetched from the cloud before plotting. For this reason we include the gdalcubes_options(parallel = 4)
line which would use any parallel processing available in your system. We will run this code chunk interactively.
gdalcubes_options(parallel = 4)
= raster_cube(col, v, mask = S2.mask) |>
rgb_mean select_bands(c("B02","B03","B04")) |>
reduce_time(c("median(B02)", "median(B03)", "median(B04)"))
rgb_mean
A data cube proxy object
Dimensions:
low high count pixel_size chunk_size
t 2022-04-01 2022-10-01 1 P184D 1
y 6853379.09435715 6897579.09435715 221 200 128
x 1865334.30546292 1910734.30546292 227 200 128
Bands:
name offset scale nodata unit
1 B02_median 0 1 NaN
2 B03_median 0 1 NaN
3 B04_median 0 1 NaN
|>
rgb_mean plot(rgb = 3:1, zlim=c(0,1200))
::include_graphics(here("figs/bkp_sen2_gdalcubes.png")) knitr
Downloading data
Nowadays, there is a lot that can be done to process data on the cloud. However, if you still need to download datasets, rstac
will have you covered with the assets_download()
function. This will download all of the items from your search, so make sure you apply enough filtering so that you don’t download data that you don’t need. More info on the function can be found on the package documentation.
Exercises
- Based on the examples from the Jupyter notebook, how would you compute the NDVI anomaly with
gdalcubes
? Go ahead and do it for any area of interest you would like. Get some hints from this r-spatial blog on gdalcubes by Marius Appel. - Compute a time series of NDVI values for one crop parcel of your choice. Hint: you can easily create a geojson polygon with https://geojson.io/. Take the temporal grouping of your choice, but what would make sense to compare such vegetation values?
More resources:
This notebook is a small demo on how to use rstac
and gdalcubes
to work with Sentinel-2 data in particular. For further examples on data processing with both packages check Marius Appel repository from the OpenGeoHub 2021.