Simulate a population of individuals within households, with complex relationships between demographic and location features.
Usage
generate_population(N = 1000, regions = NULL, capacities = NULL,
region_ids = NULL, attraction_loci = 3, random_regions = 0.1,
cost_loci = 2, size_loci = 5, similarity_metric = "euclidean",
n_neighbors = 50, neighbor_range = 0.5, n_races = 6,
n_building_types = 3, verbose = FALSE)
Arguments
- N
Number of initial individuals to generate. Final number of individuals will be larger.
- regions
A vector of region IDs, a matrix of coordinates, or an
sf
object with geometries from which coordinates can be derived. If not specified (andcapacities
is not specified), regions similar to housing units (with a mix of single and multi-family locations) will be generated.- capacities
A vector with the maximum number of households for each entry in
regions
.- region_ids
A vector of unique IDs for each
regions
, or a column name inregions
containing IDs.- attraction_loci
Number of locations selected to be centers of attractiveness, which influence where households are located.
- random_regions
A number between
0
and1
, which determines the proportion of people who are randomly relocated, in the case that there is more capacity than households.- cost_loci
Number of locations selected to be centers of cost, which influences the initial income associated with households.
- size_loci
Number of locations selected to be centers of size, which influence household sizes.
- similarity_metric
Name of a metric to use to calculate nearness between neighbors; see
lma_simets
.- n_neighbors
Number of neighbors used to influence each new household's initial age and race.
- neighbor_range
Minimum similarity between people to be considered neighbors, between
0
and1
(where0
means unrestricted, and1
means same region only).- n_races
Number of different race groups to sample from.
- n_building_types
Number of different building types to sample from.
- verbose
Logical; if
TRUE
, will show status messages.
Value
A list with entries for params
(with initial settings), and two
data.frames
:
households
household
Household ID. region
Region ID. head_income
Income of the first household member. size
Number of individuals in the household. building_type
Categorical indicator of building type. renting
Binary indicator of renting status. individuals
household
Household ID. person
Person ID. neighbors
Number of neighbors who bore on variables. age
Age in years. sex
Categorical indicator of sex. race
Categorical indicator of race. income
Income of individual.
Details
The population is generated in two steps:
First, households are generated and placed within regions
. Placement within regions
is determined by total distance from one or more regions selected as attraction loci. If
coordinates are not provided, these are first randomly generated.
After households are placed, household incomes (of the first member) is generated based on cost loci, which are then used to generate building types (where types are increasingly associated with income) and then household size (based on size loci, income, and building type). Renting status is then generated based on income and building type: 60% chance if income is under the mean income, and 20% otherwise, multiplied by .8 if the building type is of a selected renting type, or .3 otherwise.
Second, individuals are generated for each household. To generate an individual,
first, neighbors are searched for, based on n_neighbors
and neighbor_range
.
Any neighbors are summarized: average age and income, and tabulated race.
These then affect the first member of the household: age is first drawn from a Beta
distribution (with shapes of 1
and 2
if renting or 1.5
otherwise,
multiplied by 80
) and added to 18
, then adjusted toward a random value centered
on the average neighbor age (floored Gaussian with a standard deviation of 1
), and race
is sampled (that with the highest result of a Binomial draw with n_races
trials
proportion of neighbors * base rate
chance of success for each race group).
Neighbors also affect the income of the second member of the household if the first
member's income is under the neighbor mean income (or under 40,000
given no neighbors);
in this case, the second member's income is drawn from a Gaussian distribution centered on
the first member's income, with a standard deviation of 10,000
.
The second member's age is based on that of the first member; a floored Gaussian centered on
the first member's age, with a standard deviation of 15
if the first member's age
if over 40
, or 5
otherwise, trimmed to be between 18
and 90
.
The second member's race has a 70% chance to be that of the first member, and a 30% chance to be selected like the first member's.
Members after the second have no income, have age randomly selected from a
uniform distribution between 0
and the first member's age minus 15
(which is then rounded up), and have race determined by either the first or second
member (50% chance).
Sex has a 50% chance to be 0
or 1
for all but the second member; their sex
has a 10% chance to be the same as the first member's, and 90% chance to be opposite.
Examples
generate_population(2)
#> $params
#> $params$neighbors
#> [1] 50
#>
#> $params$range
#> [1] 0.5
#>
#> $params$races_rates
#> [1] 0.069106002 0.500000000 0.433570093 0.003143734 0.389064654 0.061654973
#>
#> $params$n_building_types
#> [1] 3
#>
#>
#> $regions
#> id capacity cost building_type X Y
#> 1 1 1 470671.3 2 95235 107644
#> 2 2 1 633410.2 2 108071 103390
#>
#> $households
#> household region head_income size building_type renting
#> 1 1 1 272700 15 2 0
#> 2 2 2 249513 7 2 0
#>
#> $individuals
#> household person neighbors age sex race income
#> 1 1 1 0 41 1 4 272700
#> 2 1 2 0 50 0 1 0
#> 3 1 3 0 8 1 4 0
#> 4 1 4 0 9 0 1 0
#> 5 1 5 0 25 0 4 0
#> 6 1 6 0 4 1 4 0
#> 7 1 7 0 8 1 4 0
#> 8 1 8 0 26 0 4 0
#> 9 1 9 0 11 0 4 0
#> 10 1 10 0 4 0 4 0
#> 11 1 11 0 12 1 4 0
#> 12 1 12 0 7 0 4 0
#> 13 1 13 0 8 1 4 0
#> 14 1 14 0 16 1 4 0
#> 15 1 15 0 6 0 4 0
#> 16 2 16 0 81 1 4 249513
#> 17 2 17 0 76 0 4 0
#> 18 2 18 0 31 1 4 0
#> 19 2 19 0 48 0 4 0
#> 20 2 20 0 28 1 4 0
#> 21 2 21 0 22 1 4 0
#> 22 2 22 0 25 0 4 0
#>