Extract variables and IDs from files in datacommons repositories
Usage
datacommons_map_files(dir, search_pattern = "\\.csv(?:\\.[gbx]z2?)?$",
variable_location = "measure", id_location = "geoid",
reader = read.csv, overwrite = FALSE, verbose = TRUE)
Arguments
- dir
Directory of the data commons projects.
- search_pattern
A regular expression string used be passed to
list.files
.- variable_location
The name of a column contain variable names in each dataset, or a function to retrieve variable names (e.g.,
colnames
).- id_location
The name of a column contain IDs in each dataset, or a function to retrieve IDs (e.g.,
rownames
).- reader
A function capable of handling a connection in its first argument, which returns a matrix-like object.
- overwrite
Logical; if
TRUE
, creates a new map even if one exists.- verbose
Logical; if
FALSE
, does not print status messages.
Value
An invisible list
, including a data.frame
of the mapped variables, with variable
(variable name),
repo
(the repository containing the file), dir_name
(variable name with a prefix from the parent directories),
full_name
(variable name with a prefix from the last part of the file's name, after a year or year range),
and file
(path to the file) columns, and a list
of the mapped IDs, with an entry for each ID,
each of which with entries for repos
(repositories in which the ID appears) and files
(files in which the ID appears).