Check out my github repository for more content and updates on this project.
The codes below will allow the computation of the BNA score for any city in Europe. The coding languages are a combination of R and SQL, with a few lines passed onto the Command Prompt.
Within this step, three major things are performed, assuming that the user has created a database on its PostgreSQL. To test connection an empty table on the database called “test” was created on the public schema.
library(RPostgreSQL)
# LOAD POSTGRESQL DRIVER
driver <- dbDriver("PostgreSQL")
# CREATE CONNECTION TO THE POSTGRESQL DATABASE
# THE CONNECTION VARIABLE WILL BE USED FOR ALL FURTHER OPERATIONS
connection <- dbConnect(
driver,
dbname = db_name,
host = local_host,
port = port_num,
user = user_name,
password = rstudioapi::askForPassword("Database password")
)
ifelse(
!dbExistsTable(connection, "test"),
"The connection to the database was not possible.",
"The connection to the database was successful!"
)
[1] "The connection to the database was successful!"
# DISCONNECT: Important when modifying the database on pgadmin4
# dbDisconnect(connection)
NOTICE: extension "hstore" already exists, skipping
NOTICE: extension "postgis" already exists, skipping
NOTICE: extension "pgrouting" already exists, skipping
NOTICE: schema "destinations" already exists, skipping
NOTICE: schema "generated" already exists, skipping
NOTICE: schema "received" already exists, skipping
Not only the name of the study area should be established, but also the number of subdivisions for the grid, the coordinate reference system to work with, and the biking distance that will be assumed for the connectivity analysis.
sa_name = "Cambridge"
subdivisions = 4
sa_crs = 3857
biking_distance = 3000 ## in meters
paste("You are running the BNA score for", sa_name, "within a biking distance of", biking_distance/1000, "km.")
[1] "You are running the BNA score for Cambridge within a biking distance of 3 km."
osmdata
.Extract the study area boundary considering a buffer of half the biking distance established.
# FUNCTION TO EXTRACT STUDY AREA BOUNDARY WITH OSM AND WRITE IT TO THE DATABASE
sa_bb <- function (study_area, dist, crs, conn){
# GET DATA FROM OSM
library(osmdata)
library(sf)
study_area_bb <- study_area %>%
getbb(format_out = "sf_polygon") %>%
st_transform(crs = crs)
## ADD A BUFFER TO THE BOUNDARY
study_area_bb <-study_area_bb %>%
st_buffer(dist = 0.5*dist)
# DELETE EXISTING BOUNDARY
library(sqldf)
sqldf(
"DROP TABLE IF EXISTS received.sa_boundary",
connection = conn
)
# UPLOAD BOUNDARY TO POSTGRESQL DATABASE
library(RPostgreSQL)
dbWriteTable(conn, c("received","sa_boundary"), study_area_bb)
study_area_bb
}
boundary <- sa_bb(
study_area = sa_name,
dist = biking_distance,
crs = sa_crs,
conn = connection
)
library(tmap)
tmap_mode("view")
tmap mode set to interactive viewing
qtm(
shp = boundary,
fill = NULL,
borders = "red",
basemaps = "OpenStreetMap"
)
.osm
file from Overpass API# FUNCTION TO DOWNLOAD OSM DATA WITH THE OVERPASS API
sa_download <- function(conn){
# OBTAIN THE EXTENT OF THE STUDY AREA AS A BOUNDING BOX
sa_extent <- dbGetQuery(conn,
"SELECT
ST_Extent((ST_Transform(geometry,4326)))
FROM received.sa_boundary")
library(stringr)
sa_coord <- toString(sa_extent) %>%
str_extract_all("\\-*\\d+\\.*\\d*") %>%
unlist() %>%
toString()
# CONSTRUCT THE API LINE TO REQUEST THE DATA
api <- paste(
'https://overpass-api.de/api/map?bbox=',
sa_coord,
sep = ''
)
# CREATE A NEW TEMPORAL DIRECTORY TO DOWNLOAD THE INFO
cd <- getwd()
ifelse(
!file.exists(file.path(cd,'temp')),
dir.create(file.path(cd,'temp')),
"Directory already exists"
)
# ESTABLISH THE NAME OF THE FILE WHERE THE OVERPASS API WILL DOWNLOAD ITS DATA
osm_file <- file.path(cd,'temp','overpass.osm')
# REQUEST THE DATA FROM THE API
library(utils)
download.file(url = api, destfile = osm_file, extra = '-nv -O')
ifelse(
file.exists(file.path(cd,'temp','overpass.osm')),
"OMS data successfully downloaded!",
"OSM data was not downloaded, please try again or download manually."
)
}
# DOWNLOAD THE DATA FROM OSM WITH OVERPASS API
sa_download(conn = connection)
trying URL 'https://overpass-api.de/api/map?bbox=0.0551641725691649, 52.1496753485212, 0.19802494271238, 52.2454796501678'
Content type 'application/osm3s+xml' length unknown
downloaded 87.0 MB
[1] "OMS data successfully downloaded!"
# CREATE A NEW TEMPORAL DIRECTORY TO DOWNLOAD THE INFO
cd <- getwd()
ifelse(
!file.exists(file.path(cd,'temp')),
dir.create(file.path(cd,'temp')),
"Directory already exists"
)
[1] "Directory already exists"
# ESTABLISH THE NAME OF THE FILES
pfbstyle_file <- file.path(cd,'temp','pfb.style')
mapconfig_file <- file.path(cd,"temp","mapconfig.xml")
mapconfigbikes_file <- file.path(cd,"temp","mapconfig_for_bicycles.xml")
# CHECK IF THEY ARE DOWNLOADED
if(
file.exists(file.path(pfbstyle_file)) &
file.exists(file.path(mapconfig_file)) &
file.exists(file.path(mapconfigbikes_file))
){
"Files are already downloaded!"
} else {
# ESTABLISH THE URLS
pfbstyle_url <- "https://raw.githubusercontent.com/azavea/pfb-network-connectivity/develop/src/analysis/import/pfb.style"
mapconfig_url <- "https://raw.githubusercontent.com/pgRouting/osm2pgrouting/master/mapconfig.xml"
mapconfigbikes_url <-
"https://raw.githubusercontent.com/pgRouting/osm2pgrouting/master/mapconfig_for_bicycles.xml"
# REQUEST THE DATA
library(utils)
download.file(url = pfbstyle_url, destfile = pfbstyle_file)
download.file(url = mapconfig_url, destfile = mapconfig_file)
download.file(url = mapconfigbikes_url, destfile = mapconfigbikes_file)
}
[1] "Files are already downloaded!"
osm2pgsql
and osm2pgrouting
NOTE: To run this command set path variables for osm2pgsql
and osm2pgrouting
and create a password file on %APPDATA%/postgresql/pgpass.conf with the format hostname:port:database:username:password.
Replace variables between %
:
system(
command = "osm2pgsql -c -d %DBNAME% -U %USERNAME% -H %HOSTNAME% -W --create --prefix sa_full -S %CURRENTDIRECTORY/temp/pfb.style% %CURRENTDIRECTORY/temp/overpass.osm% --cache 600",
show.output.on.console = TRUE
)
system(
command = "osm2pgrouting -f %CURRENTDIRECTORY/temp/overpass.osm% -h %HOSTNAME% --password %DBPASSWORD% -d %DBNAME% --username %USERNAME% --schema received --prefix sa_all_ --conf %CURRENTDIRECTORY/temp/mapconfig.xml% --clean",
show.output.on.console = TRUE
)
system(
command = "osm2pgrouting -f %CURRENTDIRECTORY/temp/overpass.osm% -h %HOSTNAME% --password %DBPASSWORD% -d %DBNAME% --username %USERNAME% --schema received --prefix sa_bike_ --conf %CURRENTDIRECTORY/temp/mapconfig_for_bicycles.xml% --clean",
show.output.on.console = TRUE
)
On this step, several SQL queries are being run to organize the tables, clip them to the study area outline, merge it with the osm2pgsql data among other things. The scripts include mainly the code that PfB already uses, but with some modifications like changing feet to meters, mph to km/h, and others. Basically, 4 steps are taken:
Which drops unused columns and projects data to the already established CRS. It also cleans the database for a new analysis to be run if the study area is changed for example.
Which clips the data into the boundary of the study area selected.
Which does modifications to the following columns on the ways table:
Which does modifications to the following columns on the intersection table.
An example of how the tables look like after this step:
SELECT * FROM received.sa_ways LIMIT 10;
SELECT * FROM received.sa_ways_int LIMIT 10;
The code for this step is not included as it is basically the same as the PfB code, and is actually quite long. However, it can be examined through the Rmd file for this R Notebook.
One additional value that I was considering on adding is slope, however I have not gone through with the complete implementation of the variable yet. This would also affect step 6.
The same as step 5, this step considers mainly SQL queries already performed by PfB. It will mainly alter the columns meant to host the stress rank for segments and intersections. It considers different cases to do the classification. The scripts that can be further examined on the Rmd file are:
An example of how the tables look like after this step:
SELECT osm_id, name, ft_seg_stress, tf_seg_stress, ft_int_stress, tf_int_stress FROM received.sa_ways LIMIT 10;
On this step the network is built by creating two tables: vertices and links. As the last 2 steps, the code won’t be include but can be analyzed on the Rmd file.
So, this is one of the main differences regarding the PfB approach and mine. Instead of using US census blocks I used a population grid of 1 km2 for the entire European territory. Since its area is quite big, I created a subdivision code to split the data, considering partial populations for each new cell depending on the mother cell. To do this I followed two steps:
## Download data and load to PostgreSQL
if (!dbExistsTable(connection, c("received","geostat"))){
# CREATE A NEW TEMPORAL DIRECTORY TO DOWNLOAD THE INFO
cd <- getwd()
ifelse(
!file.exists(file.path(cd,'temp')),
dir.create(file.path(cd,'temp')),
"Directory already exists"
)
# ESTABLISH THE NAME OF THE FILE WHERE THE GEOSTAT DATA WILL BE DOWNLOADED AND UNZIPPED
geostat_file <- file.path(cd,'temp','geostat.zip')
geostat_exdir <- file.path(cd,"temp","geostat")
if (!file.exists(geostat_exdir)){
# DEFINE THE URL FROM WHERE THE DATA COMES
geostat_url <-
"https://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip"
# DOWNLOAD THE FILE, UNZIP IT AND DELETE .ZIP
library(utils)
download.file(url = geostat_url, destfile = geostat_file)
unzip(geostat_file, exdir = geostat_exdir)
file.remove(geostat_file)
}
# CALL DATA INTO R AND REPROJECT
library(sf)
table_path <- file.path(
geostat_exdir,
"Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
)
grid_path <- file.path(
geostat_exdir,
"Version 2_0_1/GEOSTATReferenceGrid/Grid_ETRS89_LAEA_1K-ref_GEOSTAT_POP_2011_V2_0_1.shp"
)
pop_table <- st_read(table_path)
names(pop_table) <- pop_table %>% names() %>% tolower()
pop_grid <- st_read(grid_path)
pop_grid_t <- pop_grid %>% st_transform(crs = sa_crs)
names(pop_grid_t) <- pop_grid_t %>% names() %>% tolower()
# LOAD TO POSTGRESQL
library(sqldf)
sqldf(
"
DROP TABLE IF EXISTS received.pop_grid;
DROP TABLE IF EXISTS received.pop_table;
",
connection = connection
)
dbWriteTable(
conn = connection,
name = c("received","pop_grid"),
value = pop_grid_t
)
dbWriteTable(
conn = connection,
name = c("received","pop_table"),
value = pop_table
)
#### Join tables on data base and extract study area
sqldf(
"
-- Create join between .csv and .shp
DROP TABLE IF EXISTS received.geostat;
DROP INDEX IF EXISTS received.geostat_geom_idx;
CREATE TABLE received.geostat AS
SELECT grid.grd_id, grid.geometry, tab.tot_p, tab.cntr_code, tab.year, tab.tot_p_con_dt
FROM received.pop_grid grid, received.pop_table tab
WHERE grid.grd_id = tab.grd_id;
CREATE INDEX geostat_geom_idx
ON received.geostat
USING gist
(geometry);
DROP TABLE IF EXISTS received.pop_grid;
DROP TABLE IF EXISTS received.pop_table;
",
connection = connection
)
} else {
"GEOSTAT data already loaded to database."
}
[1] "GEOSTAT data already loaded to database."
Warning message:
In postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver warning: (unrecognized PostgreSQL field type geometry (id:56267) in column 5)
sqldf::sqldf(
"
DROP TABLE IF EXISTS received.sa_geostat;
DROP INDEX IF EXISTS received.sa_geostat_geom_idx;
-- Extract the grids concerning only the study area
CREATE TABLE received.sa_geostat AS
SELECT DISTINCT geo.grd_id,
CAST(geo.tot_p AS INTEGER),
geo.cntr_code,
geo.geometry
FROM received.geostat geo, received.sa_ways w
WHERE ST_Intersects(geo.geometry, w.geom);
CREATE INDEX sa_geostat_geom_idx
ON received.sa_geostat
USING gist
(geometry);
",
connection = connection
)
NOTICE: index "sa_geostat_geom_idx" does not exist, skipping
## Establish a function to create grid with different number of subdivisions, defaults to 9
grid <- function(s = 9){
## Call it as an sf object and then transform it to CRS:3035 to create grid
library(sf)
library(dplyr, quietly = TRUE)
sa_pop_1km2 <- st_read(
dsn = connection,
layer = c("received", "sa_geostat")
) %>%
st_transform(crs = 3035)
## Determine number of horizontal and vertical cells
h <- as.integer(as.numeric(diff(st_bbox(sa_pop_1km2)[c(1, 3)]))/1000)
v <- as.integer(as.numeric(diff(st_bbox(sa_pop_1km2)[c(2, 4)]))/1000)
## Make grid
grid <- sa_pop_1km2 %>%
st_make_grid(n=c(h*sqrt(s),v*sqrt(s)), what = "polygons") %>%
st_sf() %>%
mutate(id = 1:n()) %>%
st_intersection(sa_pop_1km2)
## Filter grid by area of intersection because there are small polygons created.
grid$area <- grid %>% st_geometry() %>% st_area() %>% as.numeric()
grid <- grid %>% filter(area > 1)
grid$area <- NULL
grid <- within(grid, cell_id <- paste(grd_id,"C",id, sep = ""))
grid$id <- NULL
grid$partial_p <- grid$tot_p/s
grid %>% st_transform(crs = sa_crs)
}
sa_grid <- grid(s = subdivisions) # Always consider a squared number to make an even division
sqldf::sqldf(
"DROP TABLE IF EXISTS generated.sa_pop_grid",
connection = connection
)
## Load data into data base
RPostgreSQL::dbWriteTable(
conn = connection,
name = c("generated","sa_pop_grid"),
value = sa_grid
)
[1] TRUE
library(tmap)
tmap_mode("view")
tmap mode set to interactive viewing
qtm(
shp = sa_grid,
fill = NULL,
borders = "red",
basemaps = "OpenStreetMap"
)
Plotting the grid on this step can allow the analyst decide on a better number of subdivisions, depending on the study area. I hope to automatize this on a later effort.
This step is only generating new columns on my new Population Grid table. It follows the same logic as the PfB and therefore won’t be included on this document explicitly.
This is the core of the whole BNA analysis, where the actual network analysis is performed. This step might take some computation time. It is again the same as PfB, and can be reviewed with more detail on the Rmd file. It basically uses pgrouting, therefore the Dijkstra algorithm to compute the driving distance considering the configuration established on step 7. It does it for the two levels of traffic stress:
On this step basically four procedures take place:
Where a new table sa_connected_pop_grid
is created to summarize the connected cells by establishing them as source and target, including if they are connected by the low or high stress network, and obtaining the minimum the costs between cells.
The access computation on this step fills up the sa_pop_grid
table created on step 8, according to the PfB methodology.
To compute access on this an the next step, a weighting procedure is used, as the methodology of PfB does, which can be accessed here.
A quick glance of the weights used, mainly for step 11:
Scoring Category | Measure |
---|---|
People = 15 | Population = N/A |
Opportunity = 20 | Employment = 35 |
K-12 Education = 35 | |
Technical/vocational school = 10 | |
Higher Education = 20 | |
Core Services = 20 | Doctor offices/clinics = 20 |
Dentist offices = 10 | |
Hospitals = 20 | |
Pharmacies = 10 | |
Supermarkets = 25 | |
Social services = 15 | |
Recreation = 15 | Parks = 40 |
Recreational trails = 35 | |
Community centers = 25 | |
Retail = 15 | Retail shopping = N/A |
Transit = 15 | Station/transit centers = N/A |
It is important to note that this reproduction of the BNA for Europe does not include employment data, as until now, I have not located a source to provide this information as open data for the whole Europe. Therefore, the final results will show this category but with 0 or NA values.
Which uses the osm polygons and points generated by osm2pgsql
. The destinations included are:
Where the access to the destinations established before is computed. Access to recreational trails and bike paths is also included.
Once again, on this step I do not include the SQL codes, however they can be accessed through the Rmd
file.
During this step the overall access is computed for each population grid, meaning that we can already observe the BNA score spatial behavior within our study area.
For this step a new table is generated in the database sa_score_inputs
to store the preliminary results. The code can be accessed on the Rmd
file.
The overall results obtained include the final score for the whole city, as well as the score per destination category. The total population and stress network is also calculated. The results can be observed on the following table.
Score/Value | |
---|---|
Overall Score | 72.41 |
Population | 141307 |
Length of Low Stress Network (km) | 1604.9 |
Length of High Stress Network (km) | 243.9 |
People | |
Total People | 76.81 |
Opportunity | |
Employment | 0 |
K-12 Education | 80.97 |
Technical/vocational school | 78.19 |
Higher Education | 76.45 |
Total Opportunity | 51.45 |
Core Services | |
Doctor offices/clinics | 79.67 |
Dentist offices | 76.24 |
Hospitals | 68.47 |
Pharmacies | 74.61 |
Supermarkets | 85.05 |
Social services | 82 |
Total Core Services | 78.28 |
Retail | |
Total Retail shopping | 79.26 |
Recreation | |
Parks | 83.87 |
Recreational trails | 93.65 |
Community centers | 71.69 |
Total Recreation | 84.25 |
Transit | |
Total Transit | 69.42 |
We can plot the results to have a quick view of the output, including the high and low stress network in an interactive way.
library(sf)
bna_score <- st_read(
dsn = connection,
layer = c("generated","sa_pop_grid")
)
stress_network <- st_read(
dsn = connection,
query = "SELECT ft_seg_stress, tf_seg_stress, geom FROM received.sa_ways"
)
bna_pal <- c("#FC7151","#DC7E6A","#C98875","#C08B83","#AD9396",
"#9C9A9F","#929EAC","#78AAC5","#6FADCB","#49BFE6")
bna_breaks <- c(6,12,18,24,30,36,42,48,54,100)
stress_network$ft_stress <- ifelse(stress_network$ft_seg_stress == 1,"low stress","high stress")
stress_network$tf_stress <- ifelse(stress_network$tf_seg_stress == 1,"low stress","high stress")
library(tmap)
tmap_mode("view")
tmap mode set to interactive viewing
int_map <-
tmap::tmap_leaflet(
tmap::tm_view(
basemaps = c(
"CartoDB.Positron",
"CartoDB.DarkMatter",
"OpenStreetMap.Mapnik"
)
) +
tmap::tm_shape(bna_score) +
tmap::tm_polygons(
col = "overall_score",
style = "fixed",
breaks = bna_breaks,
palette = bna_pal,
alpha = 0.8,
title = "BNA score",
border.col = NULL,
colorNA = NULL,
showNA = FALSE
) +
tmap::tm_shape(stress_network) +
tmap::tm_lines(
col = "ft_stress",
colorNA = NULL,
showNA = FALSE,
palette = c("firebrick1", "deepskyblue3"),
title.col = "Stress network"
) +
tmap::tm_shape(stress_network) +
tmap::tm_lines(
col = "tf_stress",
colorNA = NULL,
showNA = FALSE,
palette = c("firebrick1", "deepskyblue3"),
legend.col.show = FALSE
)
)
int_map
The total time that this particular city took to compute its BNA, including plots, overpass download, but without GEOSTAT
data download on system, upload on file, and processing on database was 38.9 minutes. Adding the GEOSTAT
processing to the workflow adds around 5 minutes. Of course this is also subject to internet speed and computer performance. I run my analysis on an Asus F541U, Intel Core i7, 8GB RAM and 256 GB SSD, with Windows 10 OS.
What can be observed for the whole analysis is that the resulting BNA score is highly influenced by the fact that the job/employment data was not included yet. However, this was an attempt to reproduce the score as close as possible as PfB apply their methodology, just to explore its reproducibility.
My plan next is to exclude this variable from the BNA score computation, and perhaps include some other variables that would suit the European context better.
My final goal for the moment is to try to validate the scoring methodology for Europe. I picked a city in the UK as I know there is Origin-Destination data available that could be used as a validation method.
> Social services