For Developers

GitHub materials
Developer workflow
Standards
Writing measure_info
- Keys
- Citations
- Source
- Statement
- Layer
How to set up environmental variables

GitHub materials

Git Cheat Sheet Branch Write Push Merge

Developer workflow

Developer Workflow

Standards

Repo organization

We have 11 data repositories, divided thematically by general data topics
- sdc.broadband
- sdc.business_climate
- sdc.demographics
- sdc.education
- sdc.environment
- sdc.financial_well_being
- sdc.food
- sdc.health
- sdc.housing
- sdc.transportation
- sdc.public_safety
Within a repository, data is organized in thematic topic folders
Within a most specific top folder, we have (as necessary)
- code
  - all code used to replicate a distribution dataset
  - Use the following naming conventions
    - ingest files contain steps to acquire the data, write to /original
    - prepare files contain data manipulation, write to /working or /distribution
- data
  - All original, intermediary, and final datasets in appropriate folders
  - See dataset table naming guidelines
- docs
  - Supporting documentation for data or methods (e.g. literature or technical reports)
Within /code, /data, /docs, we have (as necessary)
- original
- working
- distribution
  - /distribution datasets need to be compressed
  - /distribution datasets need to follow the column names
  - /distribution datasets need a measure_info file
All /distribution datasets need /distribution code

Column names

Table naming guidance

The naming convention for data tables is as follows:

<coverage_area>_<resolution>_<data source>_<time period>_<title>

For example, a table created from ACS 5 year data on health insurance could look as follows: va_bg_acs5_2015_adults_health_insured_by_sex

Abbreviation Standards
- Coverage Area (2 characters for state/province or country; 3 fips characters for sub-state/province)
  - us, United States
  - va, Virginia
  - va013, Virginia, Arlington County
- Resolutions (2 characters)
  - bl, census block
  - bg, census block group
  - tr, census tract
  - nb, neighborhood
  - ct, county
  - hd, health district
  - co, country
  - pl, place locations
  - pr, person data
  - bz, business data
- Data Sources (up to 5 characters; this list will continually grow)
  - acs5, American Community Survey 5-Year Data
  - lodes, LEHD Origin-Destination Employment Statistics
  - pseo, Post-Secondary Employment Outcomes
  - qwi, Quarterly Workforce Indicators
  - mcig, Mastercard Inclusive Growth Score
  - hifld, Homeland Infrastructure Foundation-Level Data
  - ookla, OOKLA for Good
  - webmd, Web MD
  - sdad, (items that we have calculated)
  - abc, census address block counts

Measure naming guidance

Measures should be named to balance human and machine-readability.
Generally, the format for measures should be topic_method.
Underscores should be used to separate words in a measure.
Measures should be renamed to SDC style guidelines after we have manipulated them.
The living list of abbreviations is UNDER CONSTRUCTION.

Writing measure_info

When writing measure_info, I would suggest starting with a copy of an exemplar measure_info or a closely related measure_info (e.g. describing data from the same source).
You can edit measure_info from RStudio, you’re preferred code editor, or the GitHub GUI (really where ever you like)
It is important to avoid syntactical mistakes in your measure_info
- Use an editor that is smart for json syntax
- Use a json linter library (e.g. jsonlite::validate())
- Use an online json linter

Keys

Citations

Source

Statement

Layer

How to set up environmental variables

In your home directory, create a file named “.Renviron”. Write the names of your secrets and their value, like this to this file

# Environmental variables can be in quotes or not in quotes #
CENSUS_API_KEY="secret"
db_usr="secret"
db_pwd="secret"
DATAVERSE_KEY="secret"
DATAVERSE_SERVER="secret"
OSRM_SERVER="secret"
BEA_API_KEY="secret"
my_secret="secret"

This file will execute in the terminal when your R session starts. To retrieve an environmental variable, execute this command in R

Sys.getenv("my_secret")

## [1] ""

In action, you might use environmental variables like this

options(osrm.server = Sys.getenv("OSRM_SERVER"))

You can also install your census API key through tidycensus

library(tidycensus)

census_api_key("111111abc", install = TRUE, overwrite = TRUE)

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.

## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`

## [1] "111111abc"

# First time, reload your environment so you can use the key without restarting R.
readRenviron("~/.Renviron")
# You can check it with:
Sys.getenv("CENSUS_API_KEY")

## [1] "111111abc"

Environmental variables are not only useful time savers, but they prevent us from commiting secrets to our public repositories!