• Separate into three tables (Location, Characteristics, and Selling).

Location

Cleaning

  • “Parcel.ID”

    • Finding missing parcel ID's on county real estate website
    • In order to find missing parcel id’s the addresses given were manually entered into a James City County database. Once the address was found, the parcel id was copy and pasted into code that entered the entry into the data set. Each entry, due to internet speed and website bandwidth, took about 90 seconds to enter. With large numbers of data entries, in this case around 700, this quickly becomes a very lengthy process. For this reason, we entered in 450 as a preliminary to get a better data set, but have the capability to complete in the future, if desired.

    • "st_direction" "Street.Name" "Street.Suffix"

      • Originally while there was values assigned under each variable, there were numerous occasions where the rules were not followed. In order to properly cleaned, these rules needed to be followed. So these were combined then coded to pull out the suffix and direction properly. This way there are no "st", "ln", "rd", "way", "N", "S", "E", or "W" in the "Street.Name" Category.
  • Multiply geo coordinates by .000001 to make them into correct format
  • Remove duplicated row (22010704)
  • For the 18 missing geocodes, looked up address on google maps and filled it in between 2009 to 2013 (4 of these were not geo-codable due to bad address)

Transformation

  • Paste together part of address to create new address variable
  • Merge coordinates to census tract to get each property's census tract.
  • Kept only those for time period of interest
  • Remove those that are not in James City County (e.g Williamsburg City)
    • 21 were outside the JCC PUMA
    • 56 were outside of JCC

Restructuring

  • Aggregated up to address

 

Housing

Cleaning

  • Remove duplicated row (22010704)
  • Reformat Year.Built years from YY to YYYY. Replaced 0's with correct year.
  • Filled in missing type values
  • Checked and corrected Total.Bedrooms, total_baths and total_bedrooms for values that were outliers and out of the typical range.

Transformation

  • Recode type to include condo ownership
  • Recode year built
  • Recode number of bedrooms
  • Recode number of bathrooms

Selling

Cleaning

  • Remove duplicated row (22010704)
  • Make NA original list price of 3

Transformation

  • Put into constant 2013 dollars original and sold price

Attachments: