Codebook created (Data came with no documentation, so had to create a codebook in order to proceed)

Each variable profiled for quality (completeness, validity, consistency, and uniqueness). This is documented in the codebook.

Overall Data Description:

  • Number of observations:16,276
  • 10 complete years of data (2010:2015) 
  • Unique Identifier:  NO UNIQUE IDENTIFIER

Profiling summary

  • Duplications: 1 duplicated entry (MRIS.ListingID AR7815551 and AR7825065) 
    • All variables the same besides Listing ID
  • A row is a selling transaction of a housing unit. Housing units can repeat... no unique identifier per housing unit.

The tables below contain the results of the data profiling of key variables. To see more details and profiling for all variables, see codebook. 

 

 

Quality: Location

“MRIS.ListingID”

No duplications

VariablesCompletenessValidityUniquenessConsistency

“FullStreetAddress”

100%11 have no street number 

Street direction (e.g. N) is not always listed

“City”

100%100%

Levels: “ARLINGTON, MC LEAN, MCLEAN, ROSSLYN”

100%
Quality: Housing

“MRIS.ListingID”

No duplications
VariablesCompletenessValidityUniquenessConsistency

“RBI_HomeType”

100%100%

Levels: “Attached: Condo/Coop, Attached: TH, Detached: All”

100%

“Type”

100%100%

Levels: "Attach/Row Hse”,"Detached”,"Duplex”,"Garden 1-4 Floors" , "Hi-Rise 9+ Floors","Mid-Rise 5-8 Floors" "Multi-Family”,"Other", "Patio Home”,"Penthouse”,"Quad”,"Semi-Detached","Townhouse"

100%

“Ownership”

100%100%Levels "Condo”,"Coop”,"Fee Simple”,"Ground Rent”,"Rental Apartment"100%
"YearBuilt"100%2 sold in 2009 but Year Built of 2010 100%

“Beds”

100%

Questionable amounts (e.g. 30 bed town house)

 100%

“BathsTotal”

100%

Questionable amounts (e.g. 22 baths)

 100%

“Heating”

100%100%String description (508 unique)100%
"HeatingFuel"100%100%String description (57 unique)100%
Quality: Selling

“MRIS.ListingID”

No duplications
VariablesCompletenessValidityUniquenessConsistency

"CloseDate"

100%100%100%100%
"ClosePrice”100%100%100%100%
"ListingTracsactionType"100%100%Levels: "Foreclosure", "Foreclosure,Other/Undisclosed", "Foreclosure,Potential Short Sale", "Foreclosure,REO/Bank Owned" , "Other/Undisclosed" ,"Potential Short Sale", "REO/Bank Owned", "Standard"100%
Quality: Tax

“MRIS.ListingID”

No duplications
VariablesCompletenessValidityUniquenessConsistency
"ListingTaxID"2% TBD, None, Unknown, etc100%100%100%

Attachments:

MRIS Data Dictionary.docx (application/vnd.openxmlformats-officedocument.wordprocessingml.document)