Skip to contents

Simulate a population of individuals within households, with complex relationships between demographic and location features.

Usage

generate_population(N = 1000, regions = NULL, capacities = NULL,
  region_ids = NULL, attraction_loci = 3, random_regions = 0.1,
  cost_loci = 2, size_loci = 5, similarity_metric = "euclidean",
  n_neighbors = 50, neighbor_range = 0.5, n_races = 6,
  n_building_types = 3, verbose = FALSE)

Arguments

N

Number of initial individuals to generate. Final number of individuals will be larger.

regions

A vector of region IDs, a matrix of coordinates, or an sf object with geometries from which coordinates can be derived. If not specified (and capacities is not specified), regions similar to housing units (with a mix of single and multi-family locations) will be generated.

capacities

A vector with the maximum number of households for each entry in regions.

region_ids

A vector of unique IDs for each regions, or a column name in regions containing IDs.

attraction_loci

Number of locations selected to be centers of attractiveness, which influence where households are located.

random_regions

A number between 0 and 1, which determines the proportion of people who are randomly relocated, in the case that there is more capacity than households.

cost_loci

Number of locations selected to be centers of cost, which influences the initial income associated with households.

size_loci

Number of locations selected to be centers of size, which influence household sizes.

similarity_metric

Name of a metric to use to calculate nearness between neighbors; see lma_simets.

n_neighbors

Number of neighbors used to influence each new household's initial age and race.

neighbor_range

Minimum similarity between people to be considered neighbors, between 0 and 1 (where 0 means unrestricted, and 1 means same region only).

n_races

Number of different race groups to sample from.

n_building_types

Number of different building types to sample from.

verbose

Logical; if TRUE, will show status messages.

Value

A list with entries for params (with initial settings), and two data.frames:

households

householdHousehold ID.
regionRegion ID.
head_incomeIncome of the first household member.
sizeNumber of individuals in the household.
building_typeCategorical indicator of building type.
rentingBinary indicator of renting status.

individuals

householdHousehold ID.
personPerson ID.
neighborsNumber of neighbors who bore on variables.
ageAge in years.
sexCategorical indicator of sex.
raceCategorical indicator of race.
incomeIncome of individual.

Details

The population is generated in two steps:

First, households are generated and placed within regions. Placement within regions is determined by total distance from one or more regions selected as attraction loci. If coordinates are not provided, these are first randomly generated.

After households are placed, household incomes (of the first member) is generated based on cost loci, which are then used to generate building types (where types are increasingly associated with income) and then household size (based on size loci, income, and building type). Renting status is then generated based on income and building type: 60% chance if income is under the mean income, and 20% otherwise, multiplied by .8 if the building type is of a selected renting type, or .3 otherwise.

Second, individuals are generated for each household. To generate an individual, first, neighbors are searched for, based on n_neighbors and neighbor_range. Any neighbors are summarized: average age and income, and tabulated race.

These then affect the first member of the household: age is first drawn from a Beta distribution (with shapes of 1 and 2 if renting or 1.5 otherwise, multiplied by 80) and added to 18, then adjusted toward a random value centered on the average neighbor age (floored Gaussian with a standard deviation of 1), and race is sampled (that with the highest result of a Binomial draw with n_races trials proportion of neighbors * base rate chance of success for each race group).

Neighbors also affect the income of the second member of the household if the first member's income is under the neighbor mean income (or under 40,000 given no neighbors); in this case, the second member's income is drawn from a Gaussian distribution centered on the first member's income, with a standard deviation of 10,000.

The second member's age is based on that of the first member; a floored Gaussian centered on the first member's age, with a standard deviation of 15 if the first member's age if over 40, or 5 otherwise, trimmed to be between 18 and 90.

The second member's race has a 70% chance to be that of the first member, and a 30% chance to be selected like the first member's.

Members after the second have no income, have age randomly selected from a uniform distribution between 0 and the first member's age minus 15 (which is then rounded up), and have race determined by either the first or second member (50% chance).

Sex has a 50% chance to be 0 or 1 for all but the second member; their sex has a 10% chance to be the same as the first member's, and 90% chance to be opposite.

Examples

generate_population(2)
#> $params
#> $params$neighbors
#> [1] 50
#> 
#> $params$range
#> [1] 0.5
#> 
#> $params$races_rates
#> [1] 0.069106002 0.500000000 0.433570093 0.003143734 0.389064654 0.061654973
#> 
#> $params$n_building_types
#> [1] 3
#> 
#> 
#> $regions
#>   id capacity     cost building_type      X      Y
#> 1  1        1 470671.3             2  95235 107644
#> 2  2        1 633410.2             2 108071 103390
#> 
#> $households
#>   household region head_income size building_type renting
#> 1         1      1      272700   15             2       0
#> 2         2      2      249513    7             2       0
#> 
#> $individuals
#>    household person neighbors age sex race income
#> 1          1      1         0  41   1    4 272700
#> 2          1      2         0  50   0    1      0
#> 3          1      3         0   8   1    4      0
#> 4          1      4         0   9   0    1      0
#> 5          1      5         0  25   0    4      0
#> 6          1      6         0   4   1    4      0
#> 7          1      7         0   8   1    4      0
#> 8          1      8         0  26   0    4      0
#> 9          1      9         0  11   0    4      0
#> 10         1     10         0   4   0    4      0
#> 11         1     11         0  12   1    4      0
#> 12         1     12         0   7   0    4      0
#> 13         1     13         0   8   1    4      0
#> 14         1     14         0  16   1    4      0
#> 15         1     15         0   6   0    4      0
#> 16         2     16         0  81   1    4 249513
#> 17         2     17         0  76   0    4      0
#> 18         2     18         0  31   1    4      0
#> 19         2     19         0  48   0    4      0
#> 20         2     20         0  28   1    4      0
#> 21         2     21         0  22   1    4      0
#> 22         2     22         0  25   0    4      0
#>