GitHub materials

Git Cheat Sheet Branch Write Push Merge

Developer workflow

Developer Workflow

Standards

Repo organization

  • We have 11 data repositories, divided thematically by general data topics
    • sdc.broadband
    • sdc.business_climate
    • sdc.demographics
    • sdc.education
    • sdc.environment
    • sdc.financial_well_being
    • sdc.food
    • sdc.health
    • sdc.housing
    • sdc.transportation
    • sdc.public_safety
  • Within a repository, data is organized in thematic topic folders
  • Within a most specific top folder, we have (as necessary)
    • code
      • all code used to replicate a distribution dataset
      • Use the following naming conventions
        • ingest files contain steps to acquire the data, write to /original
        • prepare files contain data manipulation, write to /working or /distribution
    • data
    • docs
      • Supporting documentation for data or methods (e.g. literature or technical reports)
  • Within /code, /data, /docs, we have (as necessary)
    • original
    • working
    • distribution
  • All /distribution datasets need /distribution code

Column names

Table naming guidance

The naming convention for data tables is as follows:

<coverage_area>_<resolution>_<data source>_<time period>_<title>

For example, a table created from ACS 5 year data on health insurance could look as follows: va_bg_acs5_2015_adults_health_insured_by_sex

  • Abbreviation Standards
    • Coverage Area (2 characters for state/province or country; 3 fips characters for sub-state/province)
      • us, United States
      • va, Virginia
      • va013, Virginia, Arlington County
    • Resolutions (2 characters)
      • bl, census block
      • bg, census block group
      • tr, census tract
      • nb, neighborhood
      • ct, county
      • hd, health district
      • co, country
      • pl, place locations
      • pr, person data
      • bz, business data
    • Data Sources (up to 5 characters; this list will continually grow)
      • acs5, American Community Survey 5-Year Data
      • lodes, LEHD Origin-Destination Employment Statistics
      • pseo, Post-Secondary Employment Outcomes
      • qwi, Quarterly Workforce Indicators
      • mcig, Mastercard Inclusive Growth Score
      • hifld, Homeland Infrastructure Foundation-Level Data
      • ookla, OOKLA for Good
      • webmd, Web MD
      • sdad, (items that we have calculated)
      • abc, census address block counts

Measure naming guidance

  • Measures should be named to balance human and machine-readability.
  • Generally, the format for measures should be topic_method.
  • Underscores should be used to separate words in a measure.
  • Measures should be renamed to SDC style guidelines after we have manipulated them.
  • The living list of abbreviations is UNDER CONSTRUCTION.

Writing measure_info

Keys

Citations

Source

Statement

Layer

How to set up environmental variables

In your home directory, create a file named “.Renviron”. Write the names of your secrets and their value, like this to this file

# Environmental variables can be in quotes or not in quotes #
CENSUS_API_KEY="secret"
db_usr="secret"
db_pwd="secret"
DATAVERSE_KEY="secret"
DATAVERSE_SERVER="secret"
OSRM_SERVER="secret"
BEA_API_KEY="secret"
my_secret="secret"

This file will execute in the terminal when your R session starts. To retrieve an environmental variable, execute this command in R

Sys.getenv("my_secret")
## [1] ""

In action, you might use environmental variables like this

options(osrm.server = Sys.getenv("OSRM_SERVER"))

You can also install your census API key through tidycensus

library(tidycensus)

census_api_key("111111abc", install = TRUE, overwrite = TRUE)
## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`
## [1] "111111abc"
# First time, reload your environment so you can use the key without restarting R.
readRenviron("~/.Renviron")
# You can check it with:
Sys.getenv("CENSUS_API_KEY")
## [1] "111111abc"

Environmental variables are not only useful time savers, but they prevent us from commiting secrets to our public repositories!