Skip to contents

Extract variables and IDs from files in datacommons repositories

Usage

datacommons_map_files(dir, search_pattern = "\\.csv(?:\\.[gbx]z2?)?$",
  variable_location = "measure", id_location = "geoid",
  reader = read.csv, overwrite = FALSE, verbose = TRUE)

Arguments

dir

Directory of the data commons projects.

search_pattern

A regular expression string used be passed to list.files.

variable_location

The name of a column contain variable names in each dataset, or a function to retrieve variable names (e.g., colnames).

id_location

The name of a column contain IDs in each dataset, or a function to retrieve IDs (e.g., rownames).

reader

A function capable of handling a connection in its first argument, which returns a matrix-like object.

overwrite

Logical; if TRUE, creates a new map even if one exists.

verbose

Logical; if FALSE, does not print status messages.

Value

An invisible list, including a data.frame of the mapped variables, with variable (variable name), repo (the repository containing the file), dir_name (variable name with a prefix from the parent directories), full_name (variable name with a prefix from the last part of the file's name, after a year or year range), and file (path to the file) columns, and a list of the mapped IDs, with an entry for each ID, each of which with entries for repos (repositories in which the ID appears) and files (files in which the ID appears).

Examples

if (FALSE) {
# from a data commons project directory
map <- datacommons_map_files(".")
}