Exploring data cubes for vector geometries in R

BEGIN Seminar Series
Online, University of St. Andrews, Oct. 15, 2024

Lorena Abad

Department of Geoinformatics - University of Salzburg


Exploring DATA CUBES

for vector geometries in R

DATA CUBES

Earth system data cube | Source: Mahecha et al. (2020)

DATA CUBES

Raster data cube | Source: Pebesma and Bivand (2023)

DATA CUBES

4D data cube | Source: Pebesma and Bivand (2023)

“Data cubes arise naturally when we observe properties of a set of geometries repeatedly over time.”


Chapter 6 - Data Cubes | Pebesma and Bivand (2023)

“Data cubes arise naturally when we observe properties of a set of geometries repeatedly over time.”


Chapter 6 - Data Cubes | Pebesma and Bivand (2023)

Photo from European Environment Agency

“Data cubes arise naturally when we observe properties of a set of geometries repeatedly over time.”


Chapter 6 - Data Cubes | Pebesma and Bivand (2023)

Photo by Instytut IMGW on Unsplash


Exploring data cubes for

VECTOR geometries in R

VECTOR DATA CUBES

Vector data cube | Source: Pebesma and Bivand (2023)

VECTOR DATA CUBES

Raster data cube to vector data cube aggregation | Source: Pebesma and Bivand (2023)

RASTER TO VECTOR DATA CUBE

RASTER TO VECTOR DATA CUBE


Exploring data cubes for

vector geometries in R

STATION DATA

opensense
Simple feature collection with 1050 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 12.18844 ymin: 47.62055 xmax: 13.12862 ymax: 47.8726
Geodetic CRS:  WGS 84
# A tibble: 1,050 × 6
   boxName              time_start          phenomenon unit  arithmeticMean_1h
   <chr>                <dttm>              <chr>      <chr>             <dbl>
 1 1188BRD16            2024-10-01 00:00:00 PM10       µg/m³             1.35 
 2 1188BRD16            2024-10-01 00:00:00 PM2.5      µg/m³             0.754
 3 1188BRD16            2024-10-01 00:00:00 Temperatur °C               NA    
 4 13330108             2024-10-01 00:00:00 PM10       µg/m³             0.162
 5 13330108             2024-10-01 00:00:00 PM2.5      µg/m³             0.160
 6 13330108             2024-10-01 00:00:00 Temperatur °C               11.5  
 7 BayernLab Traunstein 2024-10-01 00:00:00 PM10       µg/m³            NA    
 8 BayernLab Traunstein 2024-10-01 00:00:00 PM2.5      µg/m³            NA    
 9 BayernLab Traunstein 2024-10-01 00:00:00 Temperatur °C               20.9  
10 iDEAS:lab            2024-10-01 00:00:00 PM10       µg/m³             0.787
# ℹ 1,040 more rows
# ℹ 1 more variable: geom <POINT [°]>


Data from the openSenseMap | © senseBox 2014 - 2020

R PACKAGE {stars}

EXPLORING WITH {stars}

opensense_stars
stars object with 2 dimensions and 3 attributes
attribute(s):
                       Min.    1st Qu.    Median     Mean   3rd Qu.      Max.
temperature [°C] 6.65000000 12.7511667 15.514167 17.01426 21.291500  24.56083
PM2.5 [µg/m³]    0.04090909  0.6726201  1.324079 12.93712  4.611875  66.39458
PM10 [µg/m³]     0.04227273  1.1006522  2.612205 24.76510  8.095625 149.11708
                 NA's
temperature [°C]  101
PM2.5 [µg/m³]     100
PM10 [µg/m³]      100
dimension(s):
     from to         offset   delta  refsys point
geom    1  7             NA      NA  WGS 84  TRUE
time    1 50 2024-10-01 UTC 1 hours POSIXct FALSE
                                                      values
geom POINT (12.91207 47.71819),...,POINT (12.45901 47.73125)
time                                                    NULL

EXPLORING WITH {stars}

plot(opensense_stars["PM10",,1:9], pch = 16, cex = 2)

EXPLORING WITH {stars}

Attributes to dimensions

opensense_stars |> 
  drop_units() |> 
  merge(name = "parameter")
stars object with 3 dimensions and 1 attribute
attribute(s):
                              Min. 1st Qu.  Median     Mean 3rd Qu.     Max.
temperature.PM2.5.PM10  0.04090909   1.213 6.16875 18.24046  20.937 149.1171
                        NA's
temperature.PM2.5.PM10   301
dimension(s):
          from to         offset   delta  refsys point
geom         1  7             NA      NA  WGS 84  TRUE
time         1 50 2024-10-01 UTC 1 hours POSIXct FALSE
parameter    1  3             NA      NA      NA    NA
                                                           values
geom      POINT (12.91207 47.71819),...,POINT (12.45901 47.73125)
time                                                         NULL
parameter                   temperature, PM2.5      , PM10       

EXPLORING WITH {stars}

Aggregations

opensense_stars |> 
  aggregate("1 day", mean, na.rm = TRUE)
stars object with 2 dimensions and 3 attributes
attribute(s):
                   Min.    1st Qu.    Median     Mean   3rd Qu.      Max. NA's
temperature  6.92500000 12.0009994 14.807767 16.06928 21.175293  24.31104    6
PM2.5        0.06363636  0.8447386  1.359077 12.65861  2.894707  58.88761    6
PM10         0.29340909  1.4850795  3.433376 25.80386  6.734938 128.71625    6
dimension(s):
     from to         offset  delta  refsys point
time    1  3 2024-10-01 UTC 1 days POSIXct    NA
geom    1  7             NA     NA  WGS 84  TRUE
                                                      values
time                                                    NULL
geom POINT (12.91207 47.71819),...,POINT (12.45901 47.73125)

R PACKAGE {cubble}

EXPLORING WITH {cubble}

(opensense_cubble = opensense_stars |> 
  as_cubble(key = id, index = time))
# cubble:   key: id [7], index: time, nested form, [sf]
# spatial:  [12.19, 47.62, 13.13, 47.87], WGS 84
# temporal: time [dttm], temperature [[°C]], PM2.5 [[µg/m³]], PM10 [[µg/m³]]
     id  long   lat                geom ts               
* <int> <dbl> <dbl>         <POINT [°]> <list>           
1     1  12.9  47.7 (12.91207 47.71819) <tibble [50 × 4]>
2     2  12.2  47.6 (12.18844 47.62055) <tibble [50 × 4]>
3     3  12.6  47.9 (12.64558 47.86673) <tibble [50 × 4]>
4     4  13.0  47.8 (13.03966 47.82361) <tibble [50 × 4]>
5     5  13.1  47.7 (13.12862 47.65276) <tibble [50 × 4]>
6     6  13.0  47.9  (12.97138 47.8726) <tibble [50 × 4]>
7     7  12.5  47.7 (12.45901 47.73125) <tibble [50 × 4]>

EXPLORING WITH {cubble}

opensense_cubble |> 
   face_temporal()
# cubble:   key: id [7], index: time, long form
# temporal: 2024-10-01 -- 2024-10-03 01:00:00 [1h], no gaps
# spatial:  long [dbl], lat [dbl], geom [POINT [°]]
      id time                temperature   PM2.5    PM10
   <int> <dttm>                     [°C] [µg/m³] [µg/m³]
 1     1 2024-10-01 00:00:00          NA   0.754    1.35
 2     1 2024-10-01 01:00:00          NA   0.707    1.60
 3     1 2024-10-01 02:00:00          NA   0.885    2.05
 4     1 2024-10-01 03:00:00          NA   0.841    1.70
 5     1 2024-10-01 04:00:00          NA   0.878    1.57
 6     1 2024-10-01 05:00:00          NA   0.8      1.61
 7     1 2024-10-01 06:00:00          NA   1.79     3.87
 8     1 2024-10-01 07:00:00          NA   1.15     3.41
 9     1 2024-10-01 08:00:00          NA   1.09     2.44
10     1 2024-10-01 09:00:00          NA   0.865    2.06
# ℹ 340 more rows

EXPLORING WITH {cubble}


Exploring data cubes for

vector geometries

DYNAMIC SPATIAL DATA in R

DYNAMIC SPATIAL DATA

polygons
Simple feature collection with 25 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -0.2974337 ymin: -0.00297557 xmax: 0.9730806 ymax: 1.153558
Geodetic CRS:  WGS 84
First 10 features:
   gid   datetime                       geometry
1    a 2020-10-01 POLYGON ((0.5474949 0.80889...
2    b 2020-10-01 POLYGON ((0.2791708 0.83373...
3    c 2020-10-01 POLYGON ((0.2807462 0.62779...
4    d 2020-10-01 POLYGON ((0.7650701 0.47444...
5    e 2020-10-01 POLYGON ((0.3825692 0.35378...
6    a 2020-10-02 POLYGON ((0.4961102 0.87283...
7    b 2020-10-02 POLYGON ((0.3298312 0.76120...
8    c 2020-10-02 POLYGON ((0.328914 0.568743...
9    d 2020-10-02 POLYGON ((0.7217233 0.52617...
10   e 2020-10-02 POLYGON ((0.3101455 0.31689...

DYNAMIC SPATIAL DATA

plot(polygons)

DYNAMIC SPATIAL DATA

glacier
Simple feature collection with 4261 features and 26 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 407025.2 ymin: 8502331 xmax: 845635.3 ymax: 8968083
Projected CRS: WGS 84 / UTM zone 33N
# A tibble: 4,261 × 27
   NAME     Comment IDENT YEAR_ SOURCE ANALYST YR_UNCERT Shape_Area  FLAG LENGTH
   <chr>    <chr>   <chr> <int> <chr>  <chr>   <chr>          <dbl> <int>  <dbl>
 1 Svitjod… <NA>    1581…  1966 Verti… Max Kö… <NA>       42844621.     0 11747.
 2 Biskaye… <NA>    16118  1966 Verti… Max Kö… <NA>       13012090.     0  3778.
 3 Balberg… <NA>    16601  1966 Verti… Max Kö… <NA>        5004077.     0  3382.
 4 Paradis… <NA>    16201  1966 Verti… Max Kö… <NA>        7632368.     0  5293.
 5 Kvittop… <NA>    1611…  1966 Verti… Max Kö… <NA>        2778791.     0  2613.
 6 Arlabre… <NA>    16119  1966 Verti… Max Kö… <NA>        6158767.     0  5094.
 7 Landbre… <NA>    16603  1966 Verti… Max Kö… <NA>        4330581.     0  4064.
 8 Bybreen  <NA>    16512  1966 Verti… Max Kö… <NA>        2342140.     0  2973.
 9 Evabreen <NA>    16202  1966 Verti… Max Kö… <NA>        6295764.     0  4597.
10 Tindebr… <NA>    16107  1966 Verti… Max Kö… <NA>        1186398.     0  2355.
# ℹ 4,251 more rows
# ℹ 17 more variables: FWIDTH <dbl>, Shape_Peri <dbl>, SOURCE2 <chr>,
#   SatelliteI <chr>, TIDEWATER <int>, NumLines <int>, GLIMSID <chr>,
#   debroxglac <int>, medZ <dbl>, minZ <dbl>, maxZ <dbl>, stdZ <dbl>,
#   skew <dbl>, meanSLP <dbl>, meanASP <dbl>, DEM <chr>,
#   geom <MULTIPOLYGON [m]>


Data from König et al. (2014)

DYNAMIC SPATIAL DATA

plot(glacier[,c(1,4)], border = NA)

Data from König et al. (2014)

SUMMARY GEOMETRIES


SUMMARY GEOMETRIES IN {post}


Union and dissolve

Minimum rotated rectangle

Centroids

ARRAY FORM


post_array OBJECTS

as_post_array(glacier, group_id = "IDENT", time_column_name = "YEAR_")
stars object with 2 dimensions and 25 attributes
attribute(s):
           geom           NAME              Comment            SOURCE          
 MULTIPOLYGON :33440   Length:33440       Length:33440       Length:33440      
 epsg:32633   :    0   Class :character   Class :character   Class :character  
 +proj=utm ...:    0   Mode  :character   Mode  :character   Mode  :character  
                                                                               
                                                                               
                                                                               
                                                                               
   ANALYST           YR_UNCERT          Shape_Area             FLAG       
 Length:33440       Length:33440       Min.   :4.640e+04   Min.   :0.000  
 Class :character   Class :character   1st Qu.:7.367e+05   1st Qu.:0.000  
 Mode  :character   Mode  :character   Median :2.053e+06   Median :0.000  
                                       Mean   :1.897e+07   Mean   :0.008  
                                       3rd Qu.:7.608e+06   3rd Qu.:0.000  
                                       Max.   :1.243e+09   Max.   :2.000  
                                       NA's   :29179       NA's   :29179  
    LENGTH          FWIDTH          Shape_Peri          SOURCE2         
 Min.   :    0   Min.   :    0.0   Min.   :   936.8   Length:33440      
 1st Qu.: 1277   1st Qu.:  193.0   1st Qu.:  4678.2   Class :character  
 Median : 2276   Median :  332.8   Median :  8861.5   Mode  :character  
 Mean   : 4283   Mean   :  720.5   Mean   : 22962.0                     
 3rd Qu.: 4607   3rd Qu.:  680.0   3rd Qu.: 20465.8                     
 Max.   :64859   Max.   :65609.9   Max.   :643020.8                     
 NA's   :29179   NA's   :29179     NA's   :29179                        
 SatelliteI           TIDEWATER       NumLines        GLIMSID         
 Length:33440       Min.   :0.00    Min.   :0.00    Length:33440      
 Class :character   1st Qu.:0.00    1st Qu.:1.00    Class :character  
 Mode  :character   Median :0.00    Median :1.00    Mode  :character  
                    Mean   :0.12    Mean   :1.27                      
                    3rd Qu.:0.00    3rd Qu.:1.00                      
                    Max.   :1.00    Max.   :9.00                      
                    NA's   :31772   NA's   :31772                     
  debroxglac         medZ             minZ              maxZ        
 Min.   :0.00    Min.   :   0.0   Min.   :   0.00   Min.   :   0.0  
 1st Qu.:0.00    1st Qu.: 349.0   1st Qu.:  70.75   1st Qu.: 538.8  
 Median :0.00    Median : 454.0   Median : 214.00   Median : 688.0  
 Mean   :0.05    Mean   : 484.6   Mean   : 234.13   Mean   : 722.7  
 3rd Qu.:0.00    3rd Qu.: 595.0   3rd Qu.: 354.25   3rd Qu.: 879.0  
 Max.   :1.00    Max.   :1215.0   Max.   :1177.00   Max.   :1691.0  
 NA's   :31772   NA's   :31772    NA's   :31772     NA's   :31772   
     stdZ             skew           meanSLP         meanASP      
 Min.   :  0.00   Min.   :-2.01   Min.   : 0.00   Min.   :  0.00  
 1st Qu.: 73.19   1st Qu.:-0.40   1st Qu.: 7.92   1st Qu.: 55.44  
 Median :103.44   Median :-0.10   Median :12.90   Median :149.46  
 Mean   :109.82   Mean   :-0.11   Mean   :12.71   Mean   :172.08  
 3rd Qu.:137.90   3rd Qu.: 0.20   3rd Qu.:16.61   3rd Qu.:296.63  
 Max.   :341.63   Max.   : 1.91   Max.   :42.83   Max.   :359.74  
 NA's   :31772    NA's   :31772   NA's   :31772   NA's   :31772   
     DEM           
 Length:33440      
 Class :character  
 Mode  :character  
                   
                   
                   
                   
dimension(s):
         from   to                refsys point
geom_sum    1 1672 WGS 84 / UTM zone 33N  TRUE
YEAR_       1   20                    NA FALSE
                                                        values
geom_sum POINT (630823.1 8746977),...,POINT (827599.3 8951776)
YEAR_                              [1936,1960),...,[2010,2011)

TABULAR FORM

TABULAR FORM

post_table OBJECTS

as_post_table(glacier, group_id = "IDENT", time_column_name = "YEAR_") |> 
  face_spatial()
# cubble:   key: IDENT [1672], index: YEAR_, nested form, [sf]
# spatial:  [409177.49, 8504927.97, 827599.3, 8951776.14], WGS 84 / UTM zone
#   33N
# temporal: Comment [chr], YEAR_ [int], SOURCE [chr], ANALYST [chr], YR_UNCERT
#   [chr], Shape_Area [dbl], FLAG [int], LENGTH [dbl], FWIDTH [dbl], Shape_Peri
#   [dbl], SOURCE2 [chr], SatelliteI [chr], TIDEWATER [int], NumLines [int],
#   GLIMSID [chr], debroxglac [int], medZ [dbl], minZ [dbl], maxZ [dbl], stdZ
#   [dbl], skew [dbl], meanSLP [dbl], meanASP [dbl], DEM [chr], geom
#   [MULTIPOLYGON [m]]
   IDENT   NAME                      x        y           geom_sum ts           
 * <chr>   <chr>                 <dbl>    <dbl>        <POINT [m]> <list>       
 1 11101   Pedasjenkobreen     630823. 8746977. (630823.1 8746977) <sf [2 × 25]>
 2 11102.1 Ganskijbreen        625278. 8747791. (625278.3 8747791) <sf [2 × 25]>
 3 11102.2 Ganskijbreen        623226. 8748129.   (623226 8748129) <sf [2 × 25]>
 4 11103   Sonklarbreen        611755. 8745898. (611755.3 8745898) <sf [2 × 25]>
 5 11104   Helge Backlundbreen 607633. 8735580. (607633.4 8735580) <sf [2 × 25]>
 6 11105.1 Negribreen          578058. 8734897.   (578058 8734897) <sf [2 × 25]>
 7 11105.2 Gardebreen          598368. 8738104. (598368.2 8738104) <sf [2 × 25]>
 8 11106.1 Johansenbreen       580848. 8720961. (580847.7 8720961) <sf [3 × 25]>
 9 11106.2 Petermannbreen      578589. 8715092. (578588.8 8715092) <sf [3 × 25]>
10 11107.1 <NA>                586945. 8713111. (586945.2 8713111) <sf [3 × 25]>
# ℹ 1,662 more rows

post_table OBJECTS

as_post_table(glacier, group_id = "IDENT", time_column_name = "YEAR_") |> 
  face_temporal()
# cubble:   key: IDENT [1672], index: YEAR_, long form
# temporal: 1936 -- 2010 [1Y], has gaps!
# spatial:  NAME [chr], x [dbl], y [dbl], geom_sum [POINT [m]]
   IDENT   Comment YEAR_ SOURCE ANALYST YR_UNCERT Shape_Area  FLAG LENGTH FWIDTH
 * <chr>   <chr>   <int> <chr>  <chr>   <chr>          <dbl> <int>  <dbl>  <dbl>
 1 11101   <NA>     1966 Verti… Christ… Can be 1…  55988042.     0 11309.  2129.
 2 11101   <NA>     2008 SPOT5… Christ… <NA>       50864568.     0  9903.  2876.
 3 11102.1 <NA>     1966 Verti… Christ… Can be 1…  12162520.     0  8770.  1218.
 4 11102.1 <NA>     2008 SPOT5… Christ… <NA>       10951911.     0  8373.   758.
 5 11102.2 <NA>     1966 Verti… Christ… Can be 1…   9146020.     0  6927.   482.
 6 11102.2 <NA>     2008 SPOT5… Christ… <NA>        8899379.     0  6236.   351.
 7 11103   <NA>     1966 Verti… Christ… Can be 1… 247724101.     0 17649.  7660.
 8 11103   <NA>     2008 SPOT5… Christ… <NA>      222499321.     0 13799.  7357.
 9 11104   <NA>     1966 Verti… Christ… Can be 1…  23432664.     0  6340.  2576.
10 11104   <NA>     2008 SPOT5… Christ… <NA>       20067760.     0  5659.  1101.
# ℹ 4,251 more rows
# ℹ 16 more variables: Shape_Peri <dbl>, SOURCE2 <chr>, SatelliteI <chr>,
#   TIDEWATER <int>, NumLines <int>, GLIMSID <chr>, debroxglac <int>,
#   medZ <dbl>, minZ <dbl>, maxZ <dbl>, stdZ <dbl>, skew <dbl>, meanSLP <dbl>,
#   meanASP <dbl>, DEM <chr>, geom <MULTIPOLYGON [m]>

APPLICATIONS

Glyph maps from Svalbard glacier data

APPLICATIONS

Aggregate raster to vector data cube with dynamic spatial data | Source: Abad (2024)

Further resources

📝 A post on the contents of this talk and a workflow to fetch and arrange in a VDC the station data from openSenseMap

📕 Spatial Data Science with Applications in R book, especially Chapter 6 - Data Cubes

📦 Documentation on the {stars} package

📦 Documentation on the {cubble} package

📦 Documentation on the {post} package

⚠️The package is work in progress! Any feedback is warmly appreciated.

Credits

Cover photo by Eren Namlı on Unsplash

References

Abad, Lorena. 2024. “Geomorphic Landform Monitoring with Raster and Vector Data Cubes.” In EGU General Assembly 2024. Copernicus GmbH. https://doi.org/10.5194/egusphere-egu24-10019.
König, Max, Christopher Nuth, Jack Kohler, and Moholdt. 2014. “A Digital Glacier Database for Svalbard.” In Global Land Ice Measurements from Space, 229–39. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-79818-7_10.
Mahecha, Miguel D., Fabian Gans, Gunnar Brandt, Rune Christiansen, Sarah E. Cornell, Normann Fomferra, Guido Kraemer, et al. 2020. “Earth System Data Cubes Unravel Global Multivariate Dynamics.” Earth System Dynamics 11 (February): 201–34. https://doi.org/10.5194/ESD-11-201-2020.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science with Applications in R. Geographical Analysis. 1st ed. Chapman & Hall. https://r-spatial.org/book/.