library(dplyr) # data wrangling
library(gdalcubes) # on-demand data cubes
library(ggplot2) # plotting
library(here) # resolve relative paths
library(knitr) # visualize URLs
library(osmdata) # retrieving OSM data, AOI bounding box
library(rstac) # connecting to the STAC API
library(sf) # handle geospatial data frames
library(stars) # handle spatio-temporal arrays
library(tmap) # visualizationSentinel-2 data in R
OpenGeoHub Summer School 2023
Accessing data via STAC APIs
The libraries we will use for this exercise are listed below.
Defining an extent
We will obtain our area of interest with the osmdata package.
## 405 error last time tested
aoi = getbb("poznan poland", format_out = "sf_polygon", limit = 1)## backup geojson
aoi = read_sf(here("data/poznan.geojson"))We can make sure the AOI is correct (run cell on source .qmd)
tmap_mode("view")tmap mode set to interactive viewing
qtm(aoi, fill = 'orange', fill.alpha = 0.3)And lastly the time extent:
time_extent = c("2022-04-01", "2022-10-01")We compute the bounding box both in EPSG:4326 and EPSG:3857.
(bb4326 = st_bbox(aoi)) xmin ymin xmax ymax
16.75738 52.29364 17.16362 52.53565
(bb3857 = aoi |>
st_transform(3857) |>
st_bbox()) xmin ymin xmax ymax
1865423 6853395 1910645 6897563
Querying data with rstac
Similarly to the pystac-client in Python, with rstac we define our STAC API URL, from where we can query the collections available.
Let’s stay with the earth-search API.
s = stac("https://earth-search.aws.element84.com/v0")
collections(s) |> get_request()###STACCollectionList
- collections (4 item(s)):
- sentinel-s2-l2a
- sentinel-s2-l1c
- sentinel-s2-l2a-cogs
- landsat-8-l1-c1
- field(s): collections, links
To search for the data we will refer to the parameters we defined during the extent definition.
items = s |>
stac_search(
collections = "sentinel-s2-l2a-cogs",
bbox = c(
bb4326["xmin"], bb4326["ymin"],
bb4326["xmax"], bb4326["ymax"]
),
datetime = paste0(time_extent, collapse = "/"),
limit = 500
) |>
post_request()
items###STACItemCollection
- matched feature(s): 144
- features (144 item(s) / 0 not fetched):
- S2B_33UXT_20220930_0_L2A
- S2B_33UXU_20220930_0_L2A
- S2B_33UXT_20220927_0_L2A
- S2B_33UXU_20220927_0_L2A
- S2A_33UXT_20220925_0_L2A
- S2A_33UXU_20220925_0_L2A
- S2A_33UXT_20220922_0_L2A
- S2A_33UXU_20220922_0_L2A
- S2B_33UXT_20220920_0_L2A
- S2B_33UXU_20220920_0_L2A
- ... with 134 more feature(s).
- assets:
AOT, B01, B02, B03, B04, B05, B06, B07, B08, B09, B11, B12, B8A, info, metadata, overview, SCL, thumbnail, visual, WVP
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
Can you spot the difference between the search exercise with Python and R?
We can explore the items as sf objects:
(i = items_as_sf(items))Simple feature collection with 144 features and 19 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 16.43614 ymin: 51.32453 xmax: 18.14175 ymax: 53.2402
Geodetic CRS: WGS 84
First 10 features:
sentinel:sequence sentinel:latitude_band sentinel:data_coverage
1 0 U 12.48
2 0 U 39.62
3 0 U 100.00
4 0 U 100.00
5 0 U 13.01
6 0 U 40.19
7 0 U 100.00
8 0 U 100.00
9 0 U 12.47
10 0 U 39.61
geometry sentinel:grid_square sentinel:utm_zone
1 POLYGON ((16.43889 51.4423,... XT 33
2 POLYGON ((16.86367 52.24928... XU 33
3 POLYGON ((18.01124 51.32453... XT 33
4 POLYGON ((18.07166 52.22258... XU 33
5 POLYGON ((16.43826 51.42235... XT 33
6 POLYGON ((16.87243 52.24914... XU 33
7 POLYGON ((18.01124 51.32453... XT 33
8 POLYGON ((18.07166 52.22258... XU 33
9 POLYGON ((16.43891 51.44284... XT 33
10 POLYGON ((16.86331 52.24929... XU 33
sentinel:product_id proj:epsg
1 S2B_MSIL2A_20220930T100729_N0400_R022_T33UXT_20220930T130034 32633
2 S2B_MSIL2A_20220930T100729_N0400_R022_T33UXU_20220930T130034 32633
3 S2B_MSIL2A_20220927T100029_N0400_R122_T33UXT_20220927T124321 32633
4 S2B_MSIL2A_20220927T100029_N0400_R122_T33UXU_20220927T124321 32633
5 S2A_MSIL2A_20220925T101031_N0400_R022_T33UXT_20220925T162525 32633
6 S2A_MSIL2A_20220925T101031_N0400_R022_T33UXU_20220925T162525 32633
7 S2A_MSIL2A_20220922T100031_N0400_R122_T33UXT_20220922T174559 32633
8 S2A_MSIL2A_20220922T100031_N0400_R122_T33UXU_20220922T174559 32633
9 S2B_MSIL2A_20220920T100649_N0400_R022_T33UXT_20220921T213031 32633
10 S2B_MSIL2A_20220920T100649_N0400_R022_T33UXU_20220921T213031 32633
platform view:off_nadir gsd instruments constellation
1 sentinel-2b 0 10 msi sentinel-2
2 sentinel-2b 0 10 msi sentinel-2
3 sentinel-2b 0 10 msi sentinel-2
4 sentinel-2b 0 10 msi sentinel-2
5 sentinel-2a 0 10 msi sentinel-2
6 sentinel-2a 0 10 msi sentinel-2
7 sentinel-2a 0 10 msi sentinel-2
8 sentinel-2a 0 10 msi sentinel-2
9 sentinel-2b 0 10 msi sentinel-2
10 sentinel-2b 0 10 msi sentinel-2
datetime eo:cloud_cover sentinel:valid_cloud_cover
1 2022-09-30T10:16:11Z 87.39 1
2 2022-09-30T10:15:59Z 37.93 1
3 2022-09-27T10:06:18Z 100.00 1
4 2022-09-27T10:06:03Z 93.24 1
5 2022-09-25T10:16:19Z 19.33 1
6 2022-09-25T10:16:07Z 69.90 1
7 2022-09-22T10:06:25Z 22.14 1
8 2022-09-22T10:06:10Z 31.74 1
9 2022-09-20T10:16:13Z 62.14 1
10 2022-09-20T10:16:01Z 33.71 1
sentinel:boa_offset_applied sentinel:processing_baseline
1 TRUE 04.00
2 TRUE 04.00
3 TRUE 04.00
4 TRUE 04.00
5 TRUE 04.00
6 TRUE 04.00
7 TRUE 04.00
8 TRUE 04.00
9 TRUE 04.00
10 TRUE 04.00
created updated
1 2022-09-30T17:59:19.570Z 2022-09-30T17:59:19.570Z
2 2022-09-30T18:05:38.516Z 2022-09-30T18:05:38.516Z
3 2022-09-28T07:06:42.603Z 2022-09-28T07:06:42.603Z
4 2022-09-28T07:38:48.100Z 2022-09-28T07:38:48.100Z
5 2022-09-25T19:09:22.594Z 2022-09-25T19:09:22.594Z
6 2022-09-25T20:03:57.834Z 2022-09-25T20:03:57.834Z
7 2022-09-22T20:30:42.817Z 2022-09-22T20:30:42.817Z
8 2022-09-22T20:29:38.768Z 2022-09-22T20:29:38.768Z
9 2022-09-22T21:42:12.557Z 2022-09-22T21:42:12.557Z
10 2022-09-22T21:12:58.018Z 2022-09-22T21:12:58.018Z
glimpse(i)Rows: 144
Columns: 20
$ `sentinel:sequence` <chr> "0", "0", "0", "0", "0", "0", "0", "0",…
$ `sentinel:latitude_band` <chr> "U", "U", "U", "U", "U", "U", "U", "U",…
$ `sentinel:data_coverage` <dbl> 12.48, 39.62, 100.00, 100.00, 13.01, 40…
$ geometry <POLYGON [°]> POLYGON ((16.43889 51.4423,...,…
$ `sentinel:grid_square` <chr> "XT", "XU", "XT", "XU", "XT", "XU", "XT…
$ `sentinel:utm_zone` <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,…
$ `sentinel:product_id` <chr> "S2B_MSIL2A_20220930T100729_N0400_R022_…
$ `proj:epsg` <dbl> 32633, 32633, 32633, 32633, 32633, 3263…
$ platform <chr> "sentinel-2b", "sentinel-2b", "sentinel…
$ `view:off_nadir` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ gsd <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ instruments <chr> "msi", "msi", "msi", "msi", "msi", "msi…
$ constellation <chr> "sentinel-2", "sentinel-2", "sentinel-2…
$ datetime <chr> "2022-09-30T10:16:11Z", "2022-09-30T10:…
$ `eo:cloud_cover` <dbl> 87.39, 37.93, 100.00, 93.24, 19.33, 69.…
$ `sentinel:valid_cloud_cover` <chr> "1", "1", "1", "1", "1", "1", "1", "1",…
$ `sentinel:boa_offset_applied` <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
$ `sentinel:processing_baseline` <chr> "04.00", "04.00", "04.00", "04.00", "04…
$ created <chr> "2022-09-30T17:59:19.570Z", "2022-09-30…
$ updated <chr> "2022-09-30T17:59:19.570Z", "2022-09-30…
And easily view the cloud cover for the area:
ggplot(i) +
aes(x = as.Date(datetime), y = `eo:cloud_cover`) +
geom_point() +
geom_line(group = 1) +
scale_x_date(date_breaks = "2 months")The item properties can prove useful to filter the data collection we just obtained.
Let’s take a look at the properties present.
items$features[[1]]$properties$datetime
[1] "2022-09-30T10:16:11Z"
$platform
[1] "sentinel-2b"
$constellation
[1] "sentinel-2"
$instruments
[1] "msi"
$gsd
[1] 10
$`view:off_nadir`
[1] 0
$`proj:epsg`
[1] 32633
$`sentinel:utm_zone`
[1] 33
$`sentinel:latitude_band`
[1] "U"
$`sentinel:grid_square`
[1] "XT"
$`sentinel:sequence`
[1] "0"
$`sentinel:product_id`
[1] "S2B_MSIL2A_20220930T100729_N0400_R022_T33UXT_20220930T130034"
$`sentinel:data_coverage`
[1] 12.48
$`eo:cloud_cover`
[1] 87.39
$`sentinel:valid_cloud_cover`
[1] TRUE
$`sentinel:processing_baseline`
[1] "04.00"
$`sentinel:boa_offset_applied`
[1] TRUE
$created
[1] "2022-09-30T17:59:19.570Z"
$updated
[1] "2022-09-30T17:59:19.570Z"
We already explored the eo:cloud_cover property, but there are other properties that might turn out useful, e.g. sentinel:valid_cloud_cover and sentinel:data_coverage.
We can filter the sf object we created before for this values and select the first item from our result.
ids = i |>
mutate(fid = row_number()) |>
filter(
`sentinel:valid_cloud_cover` == 1,
`sentinel:data_coverage` >= 80,
`eo:cloud_cover` == min(`eo:cloud_cover`)
) |>
pull(fid)
item = items$features[[ids[1]]]We can take a look at a preview of one item by getting the URL of the thumbnail asset. Note that rstac has a preview_plot function but it does not accept JPG formats yet.
item |>
assets_url(asset_names = "thumbnail") |>
include_graphics()Creating a STAC data cube
Once we have made our query and fetched the data, we can create an on-demand data cube with gdalcubes.
We can filter the times collection with the property_filter parameter. As we saw before we will keep only scenes with valid cloud cover values below 10%.
assets = c(
"B01","B02","B03","B04","B05","B06",
"B07","B08","B8A","B09","B11","SCL"
)
col = stac_image_collection(
items$features,
asset_names = assets,
property_filter = function(x) {x[["eo:cloud_cover"]] < 10 & x[['sentinel:valid_cloud_cover']]}
)Warning in stac_image_collection(items$features, asset_names = assets,
property_filter = function(x) {: STAC asset with name 'SCL' does not include
eo:bands metadata and will be considered as a single band source
colImage collection object, referencing 28 images with 12 bands
Images:
name left top bottom right
1 S2A_33UXU_20220905_0_L2A 16.46503 53.24020 52.24938 17.38807
2 S2A_33UXT_20220902_0_L2A 16.43614 52.34135 51.32453 18.07774
3 S2A_33UXU_20220902_0_L2A 16.46499 53.24020 52.22257 18.14175
4 S2A_33UXT_20220826_0_L2A 16.43919 52.34135 51.45200 16.90558
5 S2A_33UXU_20220826_0_L2A 16.46503 53.24020 52.24934 17.39166
6 S2A_33UXT_20220816_0_L2A 16.43906 52.34135 51.44769 16.90734
datetime srs
1 2022-09-05T10:16:11 EPSG:32633
2 2022-09-02T10:06:30 EPSG:32633
3 2022-09-02T10:06:15 EPSG:32633
4 2022-08-26T10:16:24 EPSG:32633
5 2022-08-26T10:16:12 EPSG:32633
6 2022-08-16T10:16:24 EPSG:32633
[ omitted 22 images ]
Bands:
name offset scale unit nodata image_count
1 B01 0 1 28
2 B02 0 1 28
3 B03 0 1 28
4 B04 0 1 28
5 B05 0 1 28
6 B06 0 1 28
7 B07 0 1 28
8 B08 0 1 28
9 B09 0 1 28
10 B11 0 1 28
11 B8A 0 1 28
12 SCL 0 1 28
Visualizing the data
To view the data, we create a cube_view() object considering the AOI defined before but with EPSG:3857.
Arguments dx and dy define the spatial resolution of our output, while dt corresponds to the temporal resolution.
v = cube_view(
srs = "EPSG:3857",
extent = list(t0 = time_extent[1],
t1 = time_extent[2],
left = bb3857["xmin"],
bottom = bb3857["ymin"],
right = bb3857["xmax"],
top = bb3857["ymax"]),
dx = 200, dy = 200, dt = "P1D",
aggregation = "median",
resampling = "average"
)
vA data cube view object
Dimensions:
low high count pixel_size
t 2022-04-01 2022-10-01 184 P1D
y 6853379.09435715 6897579.09435715 221 200
x 1865334.30546292 1910734.30546292 227 200
SRS: "EPSG:3857"
Temporal aggregation method: "median"
Spatial resampling method: "average"
How would you define a biweekly temporal interval? Hint ?cube_view().
Cloud masking
A cloud mask can also be defined. This will be based on the SCL values.
For that let’s first expand on SCL, which is the result of the Sentinel-2 scene classification.
We will use the stars package to visualize the layer. But first, let’s select an item with moderate cloud cover to have an idea of the values present.
ids = i |>
mutate(fid = row_number()) |>
filter(
`sentinel:valid_cloud_cover` == 1,
`sentinel:data_coverage` >= 90,
`eo:cloud_cover` <= 50
) |>
pull(fid)
item = items$features[[ids[1]]]
scl_ex = item |>
assets_url(asset_names = "SCL", append_gdalvsi = TRUE) |>
read_stars(RasterIO = list(nBufXSize = 512, nBufYSize = 512))In the data folder of this repo I added a CSV file with the SCL classes that we will use for plotting.
scl = read.csv("../../data/SCL_classes.csv")
scl = scl |>
mutate(label = paste(code, desc, sep = ": "))
scl_pal = scl$color
names(scl_pal) = scl$label
scl_ex_factor = scl_ex |>
mutate(
SCL.tif = factor(SCL.tif, levels = scl$code, labels = scl$label)
)
ggplot() +
geom_stars(data = scl_ex_factor) +
scale_fill_manual(values = scl_pal) +
theme_void()We can then see that masking out clouds would require to reference classes 3, 8 and 9.
# clouds and cloud shadows
S2.mask = image_mask("SCL", values=c(3,8,9)) Finally, we can visualize a median composite of our collection. This particular chunk will take a while since the data is fetched from the cloud before plotting. For this reason we include the gdalcubes_options(parallel = 4) line which would use any parallel processing available in your system. We will run this code chunk interactively.
gdalcubes_options(parallel = 4)
rgb_mean = raster_cube(col, v, mask = S2.mask) |>
select_bands(c("B02","B03","B04")) |>
reduce_time(c("median(B02)", "median(B03)", "median(B04)"))
rgb_meanA data cube proxy object
Dimensions:
low high count pixel_size chunk_size
t 2022-04-01 2022-10-01 1 P184D 1
y 6853379.09435715 6897579.09435715 221 200 128
x 1865334.30546292 1910734.30546292 227 200 128
Bands:
name offset scale nodata unit
1 B02_median 0 1 NaN
2 B03_median 0 1 NaN
3 B04_median 0 1 NaN
rgb_mean |>
plot(rgb = 3:1, zlim=c(0,1200)) knitr::include_graphics(here("figs/bkp_sen2_gdalcubes.png"))Downloading data
Nowadays, there is a lot that can be done to process data on the cloud. However, if you still need to download datasets, rstac will have you covered with the assets_download() function. This will download all of the items from your search, so make sure you apply enough filtering so that you don’t download data that you don’t need. More info on the function can be found on the package documentation.
Exercises
- Based on the examples from the Jupyter notebook, how would you compute the NDVI anomaly with
gdalcubes? Go ahead and do it for any area of interest you would like. Get some hints from this r-spatial blog on gdalcubes by Marius Appel. - Compute a time series of NDVI values for one crop parcel of your choice. Hint: you can easily create a geojson polygon with https://geojson.io/. Take the temporal grouping of your choice, but what would make sense to compare such vegetation values?
More resources:
This notebook is a small demo on how to use rstac and gdalcubes to work with Sentinel-2 data in particular. For further examples on data processing with both packages check Marius Appel repository from the OpenGeoHub 2021.