casestudy-dmv.Rmd
Built with R 4.3.1
This is an applied example of calculating and interpreting floating catchment areas. It walks through the data collection and preparation process, then calculates and analyses different variants of floating catchment area ratios. See the introduction for more about floating catchment areas themselves.
In this case study, providers are doctors in the National Capital Region (Washington metropolitan area; Washington D.C., Maryland, Virginia area; DMV), and consumers are populations of Census block groups.
In addition to visualization in R, this example built an interactive data site using the community package to explore the results.
The first section walks through data collection and preprocessing. The result will be a set of files that will actually go into the floating catchment area calculations, and these have already been made, so you can skip to the next section if you want to get right to those.
All of the components in that section are also collected in a single script in the data site’s repository: uva-bi-sdad/dmv_healthcare/build.R.
The first step will be to create the basic directory structure of the site, which we will also use to organize our data collection and preprocessing.
This will create a dmv_healthcare
folder in the parent
of your current working directory, and open a site.R
file:
library(community)
init_site("../dmv_healthcare", "dmv_healthcare")
# specify the main directory for final files
maindir <- "../dmv_healthcare/docs/data/"
# make a directory to store working data files in
oridir <- paste0(maindir, "original/")
dir.create(oridir, FALSE, TRUE)
Calculating any catchment area ratio requires at least these two pieces of information:
We will be using population data from the U.S. Census Bureau’s American Community Survey (ACS) 5-year estimates. These are provided by state, so we will need to collect for all three states (D.C., Maryland, and Virginia), then subset to our counties of interest.
We will also download cartographic boundary files for mapping, and origin-destination employment data for the a commuter-based catchment area.
library(catchment)
# define counties that are part of the capital region
dmv_counties <- list(
dc = "District of Columbia",
md = c("Charles", "Frederick", "Montgomery", "Prince George's"),
va = c(
"Alexandria", "Arlington", "Fairfax", "Falls Church", "Loudoun", "Manassas",
"Manassas Park", "Prince William"
)
)
data <- list()
shapes <- list()
# download / load
for(state in c("dc", "md", "va")){
# shapes
counties <- download_census_shapes(oridir, state, "county", paste0(state, "_counties"))
tracts <- download_census_shapes(oridir, state, "tract", paste0(state, "_tracts"))
blockgroups <- download_census_shapes(oridir, state, "bg", paste0(state, "_blockgroups"))
## store subsets to combine later
counties <- counties[counties$NAME %in% dmv_counties[[state]],]
counties[counties$NAME == "Fairfax", "NAME"] <- c("Fairfax City", "Fairfax")
shapes[[state]] <- list(
counties = counties,
tracts = tracts[substr(tracts$GEOID, 1, 5) %in% counties$GEOID,],
blockgroups = blockgroups[substr(blockgroups$GEOID, 1, 5) %in% counties$GEOID,]
)
# population data
data[[state]] <- download_census_population(
oridir, state, 2019, include_margins = TRUE, include_commutes = TRUE,
counties = counties$GEOID, verbose = TRUE
)
}
## create and save combined shapes
library(sf)
library(rmapshaper)
for(level in names(shapes$dc)){
st_write(
ms_simplify(do.call(rbind, lapply(shapes, "[[", level)), keep_shapes = TRUE),
paste0(maindir, paste0(level, ".geojson"))
)
}
## create and save square commutes matrix
library(Matrix)
commutes <- sparseMatrix(
{}, {}, x = 0,
dims = rowSums(vapply(data, function(d) dim(d$commutes), numeric(2))),
dimnames = rep(list(do.call(c, unname(lapply(data, function(d) colnames(d$commutes))))), 2)
)
for(d in data) commutes[rownames(d$commutes), colnames(d$commutes)] <- d$commutes
write.csv(
cbind(GEOID = rownames(commutes), as.data.frame(as.matrix(unname(commutes)))),
paste0(maindir, "commutes.csv"), row.names = FALSE
)
system2("bzip2", shQuote(paste0(maindir, "commutes.csv")))
## create and save combined population data file
data_combined <- do.call(rbind, lapply(names(data), function(state){
d <- data[[state]]$estimates
s <- shapes[[state]]$blockgroups
rownames(s) <- s$GEOID
total <- d$TOTAL.POPULATION_Total
total[total == 0] <- 1
data.frame(
GEOID = d$GEOID,
population = d$TOTAL.POPULATION_Total,
percent_female = d$SEX.BY.AGE_Female_Female / total * 100,
percent_white = d$RACE_Total_White.alone / total * 100,
percent_over_49 = rowSums(d[, grep("[5-8][05]", colnames(d))]) / total * 100,
st_coordinates(st_centroid(st_geometry(s[as.character(d$GEOID),])))
)
}))
write.csv(data_combined, paste0(maindir, "data.csv"), row.names = FALSE)
For doctor information, we will be using the Center for Medicare & Medicaid Services’s Medicare Physician & Other Practitioners - by Provider dataset.
# set up the url to download from the API
providers_url <- paste0(
"https://data.cms.gov/data-api/v1/dataset/", # base url
"a399e5c1-1cd1-4cbe-957f-d2cc8fe5d897/data/", # dataset id, which specifies year
# filter: state %in% c("DC", "MD", "VA")
"?filter[region][condition][path]=Rndrng_Prvdr_State_Abrvtn",
"&filter[region][condition][operator]=IN",
"&filter[region][condition][value][1]=DC",
"&filter[region][condition][value][2]=MD",
"&filter[region][condition][value][3]=VA",
"&offset=" # pagination, as only 1000 rows are returned at a time
)
# retrieve the data in steps
providers <- list()
offset <- 0
while(length(raw <- jsonlite::read_json(paste0(providers_url, offset)))){
cat("\rdownloading file", offset / 1000 + 1, " ")
providers[[length(providers) + 1]] <- do.call(rbind, lapply(raw, unlist))
offset <- offset + 1000
}
providers <- as.data.frame(do.call(rbind, providers))
# get the ZIP codes within the focal counties
county_shapes <- read_sf(paste0(maindir, "counties.geojson"), as_tibble = FALSE)
geography_ref <- read.csv(
"https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt"
)
zips <- unique(unlist(lapply(names(dmv_counties), function(state){
GEOIDs <- county_shapes[county_shapes$NAME %in% dmv_counties[[state]], "GEOID", drop = TRUE]
formatC(geography_ref[geography_ref$GEOID %in% GEOIDs, "ZCTA5"], width = 5, flag = 0)
}), use.names = FALSE))
# focus only on medical doctors who aren't also dentists, within the selected counties
providers <- providers[
grepl("\\bm\\.?d(?:\\W|$)", providers$Rndrng_Prvdr_Crdntls, TRUE) &
!grepl("d\\.?d\\.?s", providers$Rndrng_Prvdr_Crdntls, TRUE) &
providers$Rndrng_Prvdr_Zip5 %in% zips,
]
# format address strings, and get the coordinates of each unique entry
address_parts <- c(
"Rndrng_Prvdr_St1", "Rndrng_Prvdr_City", "Rndrng_Prvdr_State_Abrvtn", "Rndrng_Prvdr_Zip5"
)
providers$Rndrng_Prvdr_St1 <- sub(
"([nesw])\\.([nesw])\\.*", "\\1\\2", providers$Rndrng_Prvdr_St1, TRUE
)
providers$Rndrng_Prvdr_St1 <- sub("\\s*#\\s.*$", "", providers$Rndrng_Prvdr_St1)
providers$Rndrng_Prvdr_St1 <- sub(
"\\s*([nesw]{2})\\s.*$", " \\1", providers$Rndrng_Prvdr_St1, TRUE
)
providers$address <- do.call(paste, c(unname(as.list(providers[, address_parts])), sep = ", "))
# collapse to locations based on address
vars <- c(
"address", "X", "Y", "Rndrng_Prvdr_Gndr",
grep("^(tot|drug|med|bene)_", colnames(providers), TRUE, value = TRUE)
)
vars <- vars[!vars %in% c("Drug_Sprsn_Ind", "Med_Sprsn_Ind")]
addresses <- unique(providers$address)
# geocode addresses; takes a while
library(parallel)
cl <- makeCluster(detectCores() - 2)
address_coords <- as.data.frame(do.call(rbind, parLapply(cl, addresses, function(a){
coords <- tidygeocoder::geo(a, progress_bar = FALSE, quiet = TRUE, method = "arcgis")
if(is.na(coords$long)) coords <- tidygeocoder::geo(a, progress_bar = FALSE, quiet = TRUE)
coords
})))
rownames(address_coords) <- address_coords$address
stopCluster(cl)
# add coordinates to providers data
providers[, c("Y", "X")] <- address_coords[providers$address, c("lat", "long")]
providers <- providers[!is.na(providers$X),]
providers$locid <- paste0(providers$X, ",", providers$Y)
provider_locations <- do.call(rbind, lapply(unique(providers$locid), function(l){
d <- providers[providers$locid == l, vars]
d[d == ""] <- NA
as.data.frame(list(
address = d[1, "address"],
X = d[1, "X"],
Y = d[1, "Y"],
doctors = nrow(d),
prop_women = mean(d$Rndrng_Prvdr_Gndr == "F"),
as.list(colMeans(matrix(
as.numeric(as.matrix(d[, -(1:4)])), nrow(d),
dimnames = list(NULL, vars[-(1:4)])
), na.rm = TRUE))
))
}))
provider_locations[is.na(provider_locations)] <- NA
# identify zip codes that cross counties
zip_cross <- substr(unique(do.call(paste0,
geography_ref[geography_ref$ZCTA5 %in% zips, c("ZCTA5", "GEOID")]
)), 1, 5)
zip_cross <- zip_cross[duplicated(zip_cross)]
## check that those are actually in the focal counties
potential_ex <- provider_locations[
grepl(paste0("(?:", paste(zip_cross, collapse = "|"), ")$"), provider_locations$address) &
!grepl(
paste0("(?:", paste(unlist(dmv_counties), collapse = "|"), "),"),
provider_locations$address
),
]
potential_ex$county <- tidygeocoder::reverse_geo(potential_ex$Y, potential_ex$X, full_results = TRUE)$county
provider_locations <- provider_locations[
!provider_locations$address %in% potential_ex[
!is.na(potential_ex$county) & !grepl(
paste0("(?:", paste(unlist(dmv_counties), collapse = "|"), ")"), potential_ex$county
), "address"
],
]
# make unique IDs for each provider location
provider_locations$ID <- paste0("l", seq_len(nrow(provider_locations)))
# save provider locations dataset
write.csv(provider_locations, paste0(maindir, "providers.csv"), row.names = FALSE)
If you have access to an OSRM server, you can get travel times between each block group and provider location:
library(osrm)
options(osrm.server = Sys.getenv("OSRM_SERVER"))
traveltimes <- osrmTable(
src = data_combined[1:2, c("GEOID", "X", "Y")],
dst = provider_locations[1:2, c("ID", "X", "Y")]
)$duration
write.csv(
cbind(GEOID = rownames(traveltimes), as.data.frame(as.matrix(traveltimes))),
paste0(maindir, "traveltimes.csv"), row.names = FALSE
)
system2("bzip2", shQuote(paste0(maindir, "traveltimes.csv")))
Here, Sys.getenv("OSRM_SERVER")
is pointing to a local
server. Instructions to set up your own server are available on the Project-OSRM/osrm-backend
repository.
These files were created in the previous section:
If you did not create these files, you can download and load them with this:
library(sf)
library(Matrix)
# could set this to a temporary directory if you'd like with
# maindir <- paste0(tempdir(), "/")
if (!exists("maindir")) maindir <- "../dmv_healthcare/docs/data/"
# define files with associated object names
files <- c(
data_combined = "data.csv",
blockgroup_shapes = "blockgroups.geojson",
commutes = "commutes.csv.bz2",
provider_locations = "providers.csv",
traveltimes = "traveltimes.csv.bz2"
)
# then download and read in those files as needed
for (o in names(files)) {
path <- paste0(maindir, files[[o]])
if (!file.exists(path)) {
download.file(
paste0(
"https://raw.githubusercontent.com/uva-bi-sdad/dmv_healthcare/main/docs/data/",
files[[o]]
), path
)
}
if (!exists(o)) {
if (grepl(".geojson", files[[o]], fixed = TRUE)) {
assign(o, st_transform(read_sf(path, as_tibble = FALSE), 4326))
} else if (o %in% c("commutes", "traveltimes")) {
assign(o, as(as.matrix(read.csv(
bzfile(path),
row.names = 1, check.names = FALSE
)), "dgCMatrix"))
} else {
assign(o, read.csv(path))
}
}
}
Before getting to the calculations, we can prepare a base map to be added to:
library(leaflet)
# if you worked through the first section, you may need to load in the shapes:
if (!exists("blockgroup_shapes")) {
blockgroup_shapes <- st_transform(read_sf(
paste0(maindir, "blockgroups.geojson"),
as_tibble = FALSE
), 4326)
}
# make sure shapes are in the same order as data
rownames(blockgroup_shapes) <- blockgroup_shapes$GEOID
blockgroup_shapes <- blockgroup_shapes[as.character(data_combined$GEOID), ]
map <- leaflet(blockgroup_shapes, options = leafletOptions(attributionControl = FALSE)) |>
addProviderTiles("CartoDB.Positron") |>
addScaleBar("bottomleft") |>
addMapPane("lines", zIndex = 410) |>
addMapPane("points", zIndex = 411) |>
addPolygons(
fillColor = colorNumeric("RdYlBu", data_combined$population)(data_combined$population),
fillOpacity = 1, stroke = FALSE, group = "Population", label = data_combined$population
) |>
hideGroup("Population") |>
addLayersControl(
position = "topleft", overlayGroups = c("Doctors", "Population", "Access")
) |>
addCircles(
data = provider_locations, color = "#000", lng = ~X, lat = ~Y,
label = ~ paste0("ID: ", ID, ", Doctors: ", doctors),
group = "Doctors", options = pathOptions(pane = "points")
) |>
hideGroup("Doctors")
We can start with the most basic form of a floating catchment area ratio, the 2-step floating catchment area (2SFCA; Luo & Wang, 2003), which has binary weights in a set range:
library(catchment)
data_combined$doctors_2sfca <- catchment_ratio(
# this specifies consumers, providers, costs, and weights
data_combined, provider_locations, traveltimes, 60,
# this specifies where to find ids and values in the entered consumers and providers objects
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
verbose = TRUE
)
#> ── Calculating a Floating Catchment Area Ratio ─────────────────────────────────
#> ℹ consumers value: `population` column
#> ℹ providers value: `doctors` column
#> ℹ consumers id: `GEOID` column
#> ℹ providers id: `ID` column
#> ℹ cost: `cost` matrix
#> ℹ setting non-NA 0s to 1e-06
#> ℹ weight: cost over 0 and under 60
#> ✔ calculated 2-step floating catchment area (resources per consumer)
# for better display, we will multiply scores by 1,000 for doctors per 1,000 people
## this can also be done with the return_type argument
data_combined$doctors_2sfca <- data_combined$doctors_2sfca * 1000
# now make a map of the results
pal <- colorBin("RdYlBu", data_combined$doctors_2sfca)
map |>
addControl("Doctors Per 1,000 People (2-Step Floating Catchment Area)", "topright") |>
showGroup("Doctors") |>
addLegend("bottomright", pal, data_combined$doctors_2sfca, opacity = 1) |>
addPolygons(
fillColor = pal(data_combined$doctors_2sfca), fillOpacity = 1, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste0(
"GEOID: ", data_combined$GEOID, ", Population: ", data_combined$population,
", Per 1k People: ", round(data_combined$doctors_2sfca, 4), ", In Region: ",
round(data_combined$population * data_combined$doctors_2sfca / 1000, 4)
)
)
These results may look odd in some cases, where some regions have several doctors on top of them but have lower ratios than surrounding regions without doctors. In most cases, this is simply because doctors are being distributed across all regions in range, and in this case, that distribution is uniform over distances. Of course, it could also be the case that travel is particularly difficult in a region, making everything effectively more distant, or an issue with the routing engine may make it appear that way.
The other variants of the 2- and 3-step floating catchment area all come down to differences in weight schemes, but before looking at those, we might compare differences in cost, between the travel times we pre-calculated, with simpler Euclidean distances:
data_combined$doctors_2sfca_euclidean <- catchment_ratio(
# to get Euclidean distances between coordinates, we can just leave cost unspecified
# but since these are on a difference scale, we will need to set a different weight cutoff
data_combined, provider_locations,
weight = .54,
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
# if the coordinate columns were not the default (X = Longitude, Y = Latitude),
# you could specify them here, such as with providers_location = c("long", "lat")
verbose = TRUE, return_type = 1000
)
#> ── Calculating a Floating Catchment Area Ratio ─────────────────────────────────
#> ℹ consumers value: `population` column
#> ℹ providers value: `doctors` column
#> ℹ consumers id: `GEOID` column
#> ℹ providers id: `ID` column
#> calculating cost from locations...
#> ℹ consumers location: `consumers` columns (X and Y)
#> ℹ providers location: `providers` columns (X and Y)
#> ℹ cost: calculated euclidean distances
#> ℹ weight: cost over 0 and under 0.54
#> ✔ calculated 2-step floating catchment area (resources per 1000 consumers)
# to compare, we might look at the difference between values calculated with travel times versus
# Euclidean distances
traveltime_vs_euclidean <- data_combined$doctors_2sfca - data_combined$doctors_2sfca_euclidean
pal <- colorBin("RdYlBu", traveltime_vs_euclidean)
map |>
addControl(paste(
"Difference Between 2-Step Floating Catchment Areas Calculated with Euclidean Distance",
"(lower) VS Travel Times (higher)"
), "topright") |>
addLegend("bottomright", pal, traveltime_vs_euclidean, opacity = .7) |>
addPolygons(
fillColor = pal(traveltime_vs_euclidean), fillOpacity = .7, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("Travel Time - Euclidean:", traveltime_vs_euclidean)
)
Differences here reflect travel difficulty relative to uniform, which highlights regions with particularly difficult terrain (within road networks; at least as estimated by the routing machine).
We can use the catchment_connections
function to more
directly inspect regions of particular difficulty:
# identify a few block groups of interest
IDs <- c("240178504003", "240178505001", "240178505002", "240178511002")
rownames(data_combined) <- data_combined$GEOID
# extract connections between each of them and providers in range
connections <- catchment_connections(
data_combined[IDs, ], provider_locations, traveltimes, 60,
to_id = "ID",
return_type = "sf"
)
map |>
setView(-77, 38.5, 11) |>
addTiles() |>
addPolygons(
fillColor = pal(traveltime_vs_euclidean), fillOpacity = .5, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("Travel Time - Euclidean:", traveltime_vs_euclidean)
) |>
addPolylines(
data = connections, weight = 3, color = "#777", options = pathOptions(pane = "lines"),
highlightOptions = highlightOptions(color = "#fff", weight = 4),
label = ~ paste0("From: ", from, ", To: ", to, ", Weight: ", weight, ", Cost: ", cost)
) |>
addCircles(
data = data_combined[IDs, ], weight = 10, lng = ~X, lat = ~Y,
label = ~ paste0("GEOID: ", GEOID, ", Population: ", population),
options = pathOptions(pane = "points")
) |>
showGroup("Doctors")
This was initially a weird looking set of locations, but it turned out our routing machine only had maps for Virginia. Now, we can see that these consumer locations have more reasonable connections.
Enhanced 2-step floating catchment areas (E2SFCA; Luo & Qi, 2009) have some form of non-uniform weights, which were originally applied to travel-time-based bins, making for weights that decreased in steps away from the center of each catchment area:
step_weights <- list(c(60, .042), c(30, .377), c(20, .704), c(10, .962))
data_combined$doctors_e2sfca <- catchment_ratio(
data_combined, provider_locations, traveltimes, step_weights,
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
verbose = TRUE, return_type = 1000
)
#> ── Calculating a Floating Catchment Area Ratio ─────────────────────────────────
#> ℹ consumers value: `population` column
#> ℹ providers value: `doctors` column
#> ℹ consumers id: `GEOID` column
#> ℹ providers id: `ID` column
#> ℹ cost: `cost` matrix
#> ℹ setting non-NA 0s to 1e-06
#> ℹ weight: cost over 0 and under steps of `weight`
#> ✔ calculated 2-step floating catchment area (resources per 1000 consumers)
binary_vs_step <- data_combined$doctors_e2sfca - data_combined$doctors_2sfca
pal <- colorBin("RdYlBu", binary_vs_step)
map |>
addControl(paste(
"Difference Between 2-Step Floating Catchment Areas Calculated with Binary",
"(lower) VS Step (higher) Weights"
), "topright") |>
addLegend("bottomright", pal, binary_vs_step, opacity = .7) |>
addPolygons(
fillColor = pal(binary_vs_step), fillOpacity = .7, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("Step - Binary:", binary_vs_step)
)
Between the binary and step-weighted variants, there is generally some redistribution away from larger, more distant regions (in red) toward smaller regions nearer to cluster of doctors (in blue), though this seems to mostly happen in the D.C. area, and is nearly the reverse in northern Maryland regions.
Another enhanced 2-step floating catchment area can use continuous weights, rather than binned steps, which is sometimes called a kernel density 2-step floating catchment area (KD2SFCA; Dai, 2010). Here, we will apply a Gaussian decay function within catchment areas, then compare with the step-weighted scores:
data_combined$doctors_kd2sfca <- catchment_ratio(
data_combined, provider_locations, traveltimes, "gaussian",
scale = 19,
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
verbose = TRUE, return_type = 1000
)
#> ── Calculating a Floating Catchment Area Ratio ─────────────────────────────────
#> ℹ consumers value: `population` column
#> ℹ providers value: `doctors` column
#> ℹ consumers id: `GEOID` column
#> ℹ providers id: `ID` column
#> ℹ cost: `cost` matrix
#> ℹ setting non-NA 0s to 1e-06
#> ℹ weight: gaussian weight function
#> ✔ calculated 2-step floating catchment area (resources per 1000 consumers)
step_vs_continuous <- data_combined$doctors_kd2sfca - data_combined$doctors_e2sfca
pal <- colorBin("RdYlBu", step_vs_continuous)
map |>
addControl(paste(
"Difference Between 2-Step Floating Catchment Areas Calculated with Step",
"(lower) VS Continuous (higher) Weights"
), "topright") |>
addLegend("bottomright", pal, step_vs_continuous, opacity = .7) |>
addPolygons(
fillColor = pal(step_vs_continuous), fillOpacity = .7, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("Continuous - Step:", step_vs_continuous)
)
At least in this case, continuous weights seem to better reflect the absence of doctors in the larger regions at the extremities of the main metropolitan area (which have mostly negative values, showing they get lower scores with continuous weights). But there is also some redistribution from the most densely populated areas, perhaps due to the smoother drop-off with distance.
3-step floating catchment areas (3SFCA; Wan et al., 2012) incorporate normalized weights to try and account for potential over-influence of regions in range of multiple providers:
data_combined$doctors_3sfca <- catchment_ratio(
data_combined, provider_locations, traveltimes, step_weights,
normalize_weight = TRUE,
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
verbose = TRUE, return_type = 1000
)
#> ── Calculating a Floating Catchment Area Ratio ─────────────────────────────────
#> ℹ consumers value: `population` column
#> ℹ providers value: `doctors` column
#> ℹ consumers id: `GEOID` column
#> ℹ providers id: `ID` column
#> ℹ cost: `cost` matrix
#> ℹ setting non-NA 0s to 1e-06
#> ℹ weight: cost over 0 and under steps of `weight`
#> ℹ normalizing weight
#> ✔ calculated 3-step floating catchment area (resources per 1000 consumers)
two_vs_three <- data_combined$doctors_3sfca - data_combined$doctors_e2sfca
pal <- colorBin("RdYlBu", two_vs_three)
map |>
addControl(
"Difference Between 2- (lower) and 3- (higher) Step Floating Catchment Areas", "topright"
) |>
showGroup("Doctors") |>
addLegend("bottomright", pal, two_vs_three, opacity = .7) |>
addPolygons(
fillColor = pal(two_vs_three), fillOpacity = .7, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("three - enhanced two:", two_vs_three)
)
This has the expected general reduction in the main metropolitan area, but these providers are more selectively redistributed than you might see from simply increasing the size of catchment areas or homogenizing weights, which would work to make scores more uniform.
The final variant will incorporate the commuter data we collected before [along the same lines as the commuter-based 2-step floating catchment area; CB2SFCA; Fransen et al. (2015)], which will give portions of some population centers a wider range of options:
data_combined$doctors_cb3sfca <- catchment_ratio(
data_combined, provider_locations, traveltimes, step_weights,
normalize_weight = TRUE,
consumers_value = "population", providers_id = "ID", providers_value = "doctors",
consumers_commutes = commutes, return_type = 1000
)
single_vs_multi <- data_combined$doctors_cb3sfca - data_combined$doctors_3sfca
pal <- colorBin("RdYlBu", single_vs_multi)
map |>
addControl(
paste(
"Difference Between 3-Step Floating Catchment Areas with Single (lower) VS",
"Multiple (higher) origins"
), "topright"
) |>
addLegend("bottomright", pal, single_vs_multi, opacity = .7) |>
addPolygons(
fillColor = pal(single_vs_multi), fillOpacity = .7, weight = 1, color = "#000",
highlightOptions = highlightOptions(color = "#fff"), group = "Access",
label = paste("Multiple - Single:", single_vs_multi)
)
One limitation that comes up with the commuter-based catchment area is the lack of data about home to work commutes that cross state lines, which is particularly relevant in the national capital region where a large urban area spans three states. This means that state populations are artificially biased toward providers within their state to the degree that they commute out-of-state far enough to reach other providers.
Consumers can still utilize providers in other states if they are close enough to the border, which highlights another limitation that affects all variants. First, there are two aspects of catchment areas to understand: One is that catchment areas form networks, in that the connection between one consumer and one provider affects the connections between any other provider connected to that consumer, and any other consumers connected to that provider. In that sense, to fully account for distribution within a catchment area, you need to know about all connected areas. Two is that catchment areas are models, which means they are overly simplified, but become more realistic with added information. With these aspects in mind, consider what happens at the edges of our general region of interest. We are only including consumers and providers that fall within the counties that make up the national capital region, but just beyond these largely arbitrary boundaries, there are consumers and providers that would likely enter into the network. Artificially cutting off consideration beyond these boundaries creates a sort of warping, as parts of the catchment areas around consumers and providers are effectively cut off. Fully addressing this limitation might require calculating ratios globally, but enough practical benefit might be gained by including surrounding states, or even just parts of those states that fall within a buffer around the focal region. This question is further explored in the next case study.