Distribute data from a source frame to a target frame.
Usage
redistribute(source, target = NULL, map = list(), source_id = "GEOID",
target_id = source_id, weight = NULL, source_variable = NULL,
source_value = NULL, aggregate = NULL, weight_agg_method = "auto",
rescale = TRUE, drop_extra_sources = FALSE, default_value = NA,
outFile = NULL, overwrite = FALSE, make_intersect_map = FALSE,
fill_targets = FALSE, overlaps = "keep", use_all = TRUE,
return_geometry = TRUE, return_map = FALSE, verbose = FALSE)
Arguments
- source
A matrix-like object you want to distribute from; usually this will be the real or more complete dataset, and is often at a lower resolution / higher level.
- target
A matrix-like object you want to distribute to: usually this will be the dataset you want but isn't available, and is often at a higher resolution / lower level (for disaggregation). Can also be a single number, representing the number of initial characters of
source
IDs to derive target IDs from (useful for aggregating up nested groups).- map
A list with entries named with
source
IDs (or aligning with those IDs), containing vectors of associatedtarget
IDs (or indices of those IDs). Entries can also be numeric vectors with IDs as names, which will be used to weigh the relationship. If IDs are related by substrings (the first characters oftarget
IDs aresource
IDs), then a map can be automatically generated from them. Ifsource
andtarget
containsf
geometries, a map will be made withst_intersects
(st_intersects(source, target)
). If an intersects map is made, andsource
is being aggregated totarget
, and map entries contain multiple target IDs, those entries will be weighted by their proportion of overlap with the source area.- source_id, target_id
Name of a column in
source
/target
, or a vector containing IDs. Forsource
, this will default to the first column. Fortarget
, columns will be searched through for one that appears to relate to the source IDs, falling back to the first column.- weight
Name of a column, or a vector containing weights (or single value to apply to all cases), which apply to
target
when disaggregating, andsource
when aggregating. Defaults to unit weights (all weights are 1).- source_variable, source_value
If
source
is tall (with variables spread across rows rather than columns), specifies names of columns insource
containing variable names and values for conversion.- aggregate
Logical; if specified, will determine whether to aggregate or disaggregate from
source
totarget
. Otherwise, this will beTRUE
if there are moresource
observations thantarget
observations.- weight_agg_method
Means of aggregating
weight
, in the case that target IDs contain duplicates. Options are"sum"
,"average"
, or"auto"
(default; which will sum ifweight
is integer-like, and average otherwise).- rescale
Logical; if
FALSE
, will not adjust target values after redistribution such that they match source totals.- drop_extra_sources
Logical; if
TRUE
, will remove any source rows that are not mapped to any target rows. Useful when inputting a source with regions outside of the target area, especially whenrescale
isTRUE
.- default_value
Value to set to any unmapped target ID.
- outFile
Path to a CSV file in which to save results.
- overwrite
Logical; if
TRUE
, will overwrite an existingoutFile
.- make_intersect_map
Logical; if
TRUE
, will opt to calculate an intersect-based map rather than an ID-based map, if both seem possible. If specified asFALSE
, will never calculate an intersect-based map.- fill_targets
Logical; if
TRUE
, will make newtarget
rows for any un-mappedsource
row.- overlaps
If specified and not
TRUE
or"keep"
(default), will assigntarget
entities that are mapped to multiplesource
entities to a single source entity. The value determines how entities with the same weight should be assigned, between"first"
,"last"
, and"random"
.- use_all
Logical; if
TRUE
(default), will redistribute map weights so they sum to 1. Otherwise, entities may be partially weighted.- return_geometry
Logical; if
FALSE
, will not set the returneddata.frame
's geometry to that oftarget
, if it exists.- return_map
Logical; if
TRUE
, will only return the map, without performing the redistribution. Useful if you want to inspect an automatically created map, or use it in a later call.- verbose
Logical; if
TRUE
, will show status messages.
Value
A data.frame
with a row for each target_ids
(identified by the first column,
id
), and a column for each variable from source
.
Examples
# minimal example
source <- data.frame(a = 1, b = 2)
target <- 1:5
(redistribute(source, target, verbose = TRUE))
#> ℹ source IDs: 1
#> ℹ target IDs: `target` vector
#> ℹ map: all target IDs for single source
#> ℹ weights: 1
#> ℹ redistributing 2 variables from 1 source to 5 targets:
#> • (numb; 2) a, b
#> ℹ disaggregating...
#> ✔ done disaggregating [12ms]
#>
#> ℹ checking totals
#> ✔ totals are aligned [7ms]
#>
#> id a b
#> 1 1 0.2 0.4
#> 2 2 0.2 0.4
#> 3 3 0.2 0.4
#> 4 4 0.2 0.4
#> 5 5 0.2 0.4
# multi-entity example
source <- data.frame(id = c("a", "b"), cat = c("aaa", "bbb"), num = c(1, 2))
target <- data.frame(
id = sample(paste0(c("a", "b"), rep(1:5, 2))),
population = sample.int(1e5, 10)
)
(redistribute(source, target, verbose = TRUE))
#> ℹ source IDs: id column of `source`
#> ℹ target IDs: id column of `target`
#> ℹ map: first 1 character of target IDs
#> ℹ weights: 1
#> ℹ redistributing 2 variables from 2 sources to 10 targets:
#> • (numb; 1) num
#> • (char; 1) cat
#> ℹ disaggregating...
#> ✔ done disaggregating [6ms]
#>
#> ℹ re-converting categorical levels
#> ℹ checking totals
#> ✔ totals are aligned [8ms]
#>
#> id cat num
#> 1 b2 bbb 0.4
#> 2 a4 aaa 0.2
#> 3 a5 aaa 0.2
#> 4 b4 bbb 0.4
#> 5 b5 bbb 0.4
#> 6 a1 aaa 0.2
#> 7 b3 bbb 0.4
#> 8 a3 aaa 0.2
#> 9 a2 aaa 0.2
#> 10 b1 bbb 0.4