VT Census Case Studies : Arlington County Real Estate Assessments

Codebooks

  • API Data came with no documentation, so had to create a codebook in order to proceed.
    • See API codebook here: AC Real Estate Assessment Codebook
    • Each variable profiled for quality (completeness, validity, consistency, and uniqueness). This is documented in the codebook.
  • Following the profiling of the API data, data CDs were received from the county (one for each year form 2009 to 2013).
    • See Year data here: Data Dictionary County Real Estate .
    • Using the variables chosen from the API data, similar tables were constructed from the database to ensure direct comparability. The key variables were profiled for quality (completeness, validity, consistency, and uniqueness)—See below.

Data Description

The tables below show the overall description of the various data tables used for this study. The information provided here was conducted on the original data before any data preparation. Data that is marked with "API" or 2015 came from the API and are current as of when they were pulled (July 1, 2015).

Property: Basics information. such as primary address, owners, etc., about all parcels (identified by RealEstatePropertyCode)
Year N Unit of Analysis
2009 64,827 Active Properties
2010 65,201 Active Properties
2011 65,242 Active Properties
2012 65,364 Active Properties
2013 65,443 Active Properties
2015 68,949 Properties (Active or Inactive)

Improvement Dwelling: Information, such as year built and heating type, about each dwelling on a parcel (identified by RealEstatePropertyCode) 

Year N Unit of Analysis
2009 62,081 Dwelling
2010 62,091 Dwelling
2011 62,265 Dwelling
2012 62,400 Dwelling
2013 62,478 Dwelling
2015 60,730 Dwelling

** Note: Each property (defined by RPC - RealEstatePropertyCode) can have one or more dwellings (typically 1). Dwellings are uniquely identified by the DwellingKey or a combination of RPC and ExtensionNbr.

Improvement Interior: Information about the interior of a dwelling, such as number of bedrooms or bathrooms, on a parcel (identified by RealEstatePropertyCode)  

Year N Unit of Analysis
2009 62,081 Dwelling
2010 62,091 Dwelling
2011 62,265 Dwelling
2012 62,400 Dwelling
2013 62,478 Dwelling
2015 134,698 Each floor of a dwelling*

** Note:  For 2015,  t he key is FloorKey or a combination of  RealEstatePropertyCode, DwellingKey, ExtensionNbr, FloorNbr (this can have values like B for basement and 2.5 for a second floor with sloped ceiling). 

Assessment History (FROM DATA CDs) 
Year N Unit of Analysis
2009 324,966 Assessment history up to 2013 for a parcel (One a year)
2010 317,535 Assessment history up to 2013 for a parcel (One a year)
2011 321,547 Assessment history up to 2013 for a parcel (One a year)
2012 323,734 Assessment history up to 2013 for a parcel (One a year)
2013 324,966 Assessment history up to 2013 for a parcel (One a year)
API Data *

Table

N

Years

Unit of Observation

Assessment

410,570

2009 - 2015

Assessment event for a parcel

Assessment Payment History

1,635,898

2009 - 2015

Tax event for a parcel

Sales History

42,552

2009 - 2015

Transaction where ownership changed for a parcel

** Past Years are available off the API, but only profiled here those for the time period of interest



Data Quality

The tables below contain the results of the data profiling of key variables for the the following tables (click to jump to that section):

Note: "Real.Estate.Property.Code" is the property Code (parcel) for the Property assessed. "Master. Real.Estate.Property.Code" is specific for condo and primarily used for GIS 

Quality: Property 2009
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"propertyClassTypeDsc”

100% 100% 56 levels 100%

"propertyStreetNbr”

100% 4% listed as 0   100%

"propertyStreetNbrSuffixCode”

  Levels coded with spaces

Levels: " F " " G " " H " " I " " J " " K " "A " "B " "BK " "C " "D " "E " "F " "G " "H " "I " "J " "K " "L " "S "

NA coding: " ", " "

"propertyStreetDirectionPrefixCode   Levels coded with spaces

Levels: "N " "S "

NA Coding: " "

"propertyStreetName”

100% Levels coded with spaces   Inconsistent coding in regards to suffix, eg: TH, th

"propertyStreetTypeCode”

100% Levels coded with spaces

Levels: "AVE " "BLVD" "CIR " "CT " "DR " "HWY " "LN " "PIKE" "PKWY" "PL " "RD " "ST " "TER "

100%

"propertyStreetDirectionSuffixCode"

 

Levels coded with spaces

Levels: "N " "S "

NA coding: NA, “ ”

"propertyUnitNbr"    

Levels coded with spaces

Some unit numbers are proceeded with 4 "0000"

6,371 levels NA coding: " ", " "

"propertyCityName”

100% 100% 100% 100%

"propertyZipCode”

100%

100%

Levels: "22101" "22201" "22202" "22203" "22204" "22205" "22206" "22207" "22209" "22211" "22213" 100%

"propertyYearBuilt”

5% missing

1% are invalid entries: 0 and 1000

  100%

"numberOfUnitsCnt”

95% Invalid "0" (4%)  

100%

Quality: Property 2010
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"propertyClassTypeDsc”

100% 100% 56 levels 100%

"propertyStreetNbr”

100% 4% listed as 0   100%

"propertyStreetNbrSuffixCode”

  Levels coded with spaces

Levels: " F " " G " " H " " I " " J " " K " "A " "B " "BK " "C " "D " "E " "F " "G " "H " "I " "J " "K " "L " "S "

NA coding: " ", " "

"propertyStreetDirectionPrefixCode   Levels coded with spaces

Levels: "N " "S "

NA Coding: " "

"propertyStreetName”

100% Levels coded with spaces   Inconsistent coding in regards to suffix, eg: TH, th

"propertyStreetTypeCode”

100% Levels coded with spaces

Levels: "AVE " "BLVD" "CIR " "CT " "DR " "HWY " "LN " "PIKE" "PKWY" "PL " "RD " "ST " "TER "

100%

"propertyStreetDirectionSuffixCode"

 

Levels coded with spaces

Levels: "N " "S "

NA coding: NA, “ ”

"propertyUnitNbr"    

Levels coded with spaces

Some unit numbers are proceeded with 4 "0000"

6,412 levels NA coding: " ", " "

"propertyCityName”

100% 100% 100% 100%

"propertyZipCode”

100%

Some are outside of Arlington

Levels: "22101" "22201" "22202" "22203" "22204" "22205" "22206" "22207" "22208" "22209" "22211" "22213" 100%

"propertyYearBuilt”

5% missing

1% are invalid entries: 0 and 1000

  100%

"numberOfUnitsCnt”

95% Invalid "0" (4%)  

100%

Quality: Property 2011
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"propertyClassTypeDsc”

100% 100% 56 levels 100%

"propertyStreetNbr”

5 missing 4% listed as 0   100%

"propertyStreetNbrSuffixCode”

  Levels coded with spaces

Levels: " F " " G " " H " " I " " J " " K " "A " "B " "BK " "C " "D " "E " "F " "G " "H " "I " "J " "K " "L " "S "

NA coding: " ", " "

"propertyStreetDirectionPrefixCode   Levels coded with spaces

Levels: "N " "S "

NA Coding: " "

"propertyStreetName”

5 missing Levels coded with spaces   Inconsistent coding in regards to suffix, eg: TH, th

"propertyStreetTypeCode”

5 missing Levels coded with spaces

Levels: "AVE " "BLVD" "CIR " "CT " "DR " "HWY " "LN " "PIKE" "PKWY" "PL " "RD " "ST " "TER "

100%

"propertyStreetDirectionSuffixCode"

 

Levels coded with spaces

Levels: "N " "S "

NA coding: NA, “ ”

"propertyUnitNbr"    

Levels coded with spaces

Some unit numbers are proceeded with 4 "0000"

6,419 levels NA coding: " ", " "

"propertyCityName”

100% 100% 100% 100%

"propertyZipCode”

5 missing

Some are outside of Arlington

Levels: "22101" "22201" "22202" "22203" "22204" "22205" "22206" "22207" "22208" "22209" "22211" "22213" 100%

"propertyYearBuilt”

5% missing

1% are invalid entries: 0 and 1000

  100%

"numberOfUnitsCnt”

95% Invalid "0" (4%)  

100%

Quality: Property 2012
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"propertyClassTypeDsc”

100% 100% 56 levels 100%

"propertyStreetNbr”

100% 4% listed as 0   100%

"propertyStreetNbrSuffixCode”

  Levels coded with spaces

Levels: " F " " G " " H " " I " " J " " K " "A " "B " "BK " "C " "D " "E " "F " "G " "H " "I " "J " "K " "L " "S "

NA coding: " ", " "

"propertyStreetDirectionPrefixCode   Levels coded with spaces

Levels: "N " "S "

NA Coding: " "

"propertyStreetName”

100% Levels coded with spaces   Inconsistent coding in regards to suffix, eg: TH, th

"propertyStreetTypeCode”

100% Levels coded with spaces

Levels:"AVE " "BLVD" "CIR " "CT " "DR " "HWY " "LN " "PIKE" "PKWY" "PL " "RD " "ST " "TER "

100%

"propertyStreetDirectionSuffixCode"

 

Levels coded with spaces

Levels: "N " "S "

NA coding: NA, “ ”

"propertyUnitNbr"    

Levels coded with spaces

Some unit numbers are proceeded with 4 "0000"

6,487 levels NA coding: " ", " "

"propertyCityName”

100% 100% 100% 100%

"propertyZipCode”

100%

Some are outside of Arlington

Levels: "20003" "22003" "22004" "22101" "22201" "22202" "22203" "22204" "22205" "22206" "22207" "22208" "22209" "22211" "22213" 100%

"propertyYearBuilt”

5% missing

1% are invalid entries: 0 and 1000

  100%

"numberOfUnitsCnt”

95% Invalid "0" (4%)  

100%

Quality: Property 2013
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"propertyClassTypeDsc”

100% 100% 55 levels 100%

"propertyStreetNbr”

5 missing 4% listed as 0   100%

"propertyStreetNbrSuffixCode”

  Levels coded with spaces

Levels: " F " " G " " H " " I " " J " " K " "A " "B " "BK " "C " "D " "E " "F " "G " "H " "I " "J " "K " "L " "S "

NA coding: " ", " ", NA

"propertyStreetDirectionPrefixCode   Levels coded with spaces

Levels: "N " "S "

NA Coding: " "

"propertyStreetName”

5 missing Levels coded with spaces   Inconsistent coding in regards to suffix, eg: TH, th

"propertyStreetTypeCode”

5 missing Levels coded with spaces

Levels:"AVE " "BLVD" "CIR " "CT " "DR " "HWY " "LN " "PIKE" "PKWY" "PL " "RD " "ST " "TER "

100%

"propertyStreetDirectionSuffixCode"

 

Levels coded with spaces

Levels: "N " "S "

NA coding: NA, “ ”

"propertyUnitNbr"    

Levels coded with spaces

Some unit numbers are proceeded with 4 "0000"

6,490 levels NA coding: " ", " "

"propertyCityName”

5 missing 100% 100% 100%

"propertyZipCode”

5 missing

Some are outside of Arlington

Levels with spaces

Levels: "20003 " "20011 " "22003 " "22004 " "22101 " "22201 " "22202 " "22203 " "22204 " "22205 " "22206 " "22207 " "22208 " "22209 " "22211 " "22213 " "2224 " "222O2 " 100%

"propertyYearBuilt”

5% missing

1% are invalid entries: 0 and 1000

  100%

"numberOfUnitsCnt”

99% missing

Invalid "0" (9)

Invalid greater than 2,000 (9)

 

100%

Quality: Property 2015 ***
Duplications No duplicated rows
"realEstatePropertyCode"    No duplications
"masterRealEstatePropertyCode” No duplications
Variables Completeness Validity Uniqueness Consistency

"reasPropertyStatusCode”

100% 100%

Levels: "A", "T"

Active/Inactive

100%

"propertyClassTypeDsc”

100% 100% 58 levels 100%

"propertyStreetNbr”

6% are missing 100%   100%

"propertyStreetNbrSuffixCode”

  100%

Levels: "A"  "B"  "BK" "C"  "D"  "E"  "F"  "G"  "H"  "I"  "J"  "K"  "L"  "N"  "S"

100%
"propertyStreetDirectionPrefixCode   100%

Levels: “N”, “S”

100%

"propertyStreetName”

753 missing 100%   Inconsistent coding: 12th, 12TH, 19th, 19TH, 20th, 20TH,8th, 8TH

"propertyStreetTypeCode”

753 missing 100%

Levels: "AVE","BLVD","CIR", "CT" ,"DR" , "HWY", "LN",  "PIKE”, "PKWY", "PL",   "RD",  "ST",  "TER"

100%

"propertyStreetDirectionSuffixCode"

  100%

Levels: “N ”, “S ”

NA coding: NA, “ ”

"propertyUnitNbr"     100% 6,501 levels 100%

"propertyCityName”

753 missing “Clarendon" 100% in Arlington Multiple Spellings of Arlington

"propertyZipCode”

754 missing

Zip codes out of Arlington or invalid

Levels: "12201" "20003" "20009" "20011" "22003" "22004" "2207" "22101" "222004" "22201" "22202" "22203" "22204" "22205" "22206" "22207" "22208" "22209" "2221" "22211" "22213" "2224" "222O2" "970000" 100%

"propertyYearBuilt”

9% missing

Invalid years: “0”, “1000”

  100%

"numberOfUnitsCnt”

  Invalid "0" (9)  

100%

*** No time stamp

Improvement Dwelling 2009
Duplications No duplicated rows
"realEstatePropertyCode”

35 properties appear more than one

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

43% missing 100% 23 different levels 100%

dwellingYearBuiltDate”

100%

5% listed as 0 or 1000

  100%
Improvement Dwelling 2010
Duplications No duplicated rows
"realEstatePropertyCode”

25 properties appear more than one

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

43% missing 100% 22 different levels 100%

dwellingYearBuiltDate”

100%

4% listed as 0 or 1000

  100%
Improvement Dwelling 2011
Duplications No duplicated rows
"realEstatePropertyCode”

47 properties appear more than one

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

43% missing 100% 23 different levels 100%

dwellingYearBuiltDate”

100%

4% listed as 0 or 1000

  100%
Improvement Dwelling 2012
Duplications No duplicated rows
"realEstatePropertyCode”

57 properties appear more than one

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

43% missing 100% 23 different levels 100%

dwellingYearBuiltDate”

100%

4% listed as 0 or 1000

  100%
Improvement Dwelling 2013
Duplications No duplicated rows
"realEstatePropertyCode”

69 properties appear more than one

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

43% missing 100% 24 different levels 100%

dwellingYearBuiltDate”

100%

4% listed as 0 or 1000

  100%
Improvement Dwelling 2015 ***
Duplications 66 are duplicated rows

"realEstatePropertyCode”

167 are duplications

160 properties appear more than once

"dwellingKey"

66 are duplications

Variables Completeness Validity Uniqueness Consistency

"heatingTypeDsc”

13 missing 100% 24 different levels 100%

dwellingYearBuiltDate”

100%

67 listed as 0 or 1000

  100%
"extensionNbr" 100%

 

levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" 100%

** No time stamp

Improvement Interior 2009
Duplications

No duplicated rows

"realEstatePropertyCode”

25 of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"FullBathroomCnt”

100% 100%   100%

"HalfBathroomCnt"  

100% 100%   100%
"Extension" 100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" "VAC" 100%
Improvement Interior 2010
Duplications

No duplicated rows

"realEstatePropertyCode”

35 of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"FullBathroomCnt”

100% 100%   100%

"HalfBathroomCnt"  

100% 100%   100%
"Extension" 100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" "VAC" 100%
Improvement Interior 2011
Duplications

No duplicated rows

"realEstatePropertyCode”

47 of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"FullBathroomCnt”

100% 100%   100%

"HalfBathroomCnt"  

100% 100%   100%
"Extension" 100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" "VAC" 100%
Improvement Interior 2012
Duplications

No duplicated rows

"realEstatePropertyCode”

57 of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"FullBathroomCnt”

100% 100%   100%

"HalfBathroomCnt"  

100% 100%   100%
"Extension" 100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" "VAC" 100%
Improvement Interior 2013
Duplications

No duplicated rows

"realEstatePropertyCode”

69 of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"FullBathroomCnt”

100% 100%   100%

"HalfBathroomCnt"  

100% 100%   100%
"Extension" 100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" "VAC" 100%
Improvement Interior 2015 ***
Duplications

No duplicated rows

"realEstatePropertyCode”

80% are duplicated

64% of properties appear more than once

"floorKey"

None are duplicated

"dwellingKey

55% are duplicated

64% of dwellings appear more than once

Variables Completeness Validity Uniqueness Consistency

"bedroomCnt”

100% 100%   100%

"twoFixtureBathroomCnt”

100% 100%   100%

"threeFixtureBathroomCnt"  

100% 100%   100%

"fourFixtureBathroomCnt”

100% 100%   100%

"fiveFixtureBathroomCnt”

100% 100%   100%

"extensionNbr"

100% 100% Levels: "R01" "R02" "R03" "R04" "R05" "R06" "R07" "R10" 100%

** No time stamp

Assessments 2009 (Data CD)
Duplications None are duplicated
"Parcel_ID"

205 appear once and the rest are appear multiple times

Variables Completeness Validity Uniqueness Consistency

"Year_Assessed"

100% 100%

 

 

100%

"Impr"

100% 100% 5% list as 0 (no structure on land) 100%

"Land”

100% 100% 1% list as 0 100%

"Totl

100% 220 are listed as 0   100%
Assessments 2010 (Data CD)
Duplications None are duplicated
"Parcel_ID"

220 appear once and the rest are appear multiple times

Variables Completeness Validity Uniqueness Consistency

"Year_Assessed"

100% 100%

 

 

100%

"Impr"

100% 100% 5% list as 0 (no structure on land) 100%

"Land”

100% 100% 1% list as 0 100%

"Totl

100% 212 are listed as 0  

100%

Assessments 2011 (Data CD)
Duplications None are duplicated
"Parcel_ID"

117 appear once and the rest are appear multiple times

Variables Completeness Validity Uniqueness Consistency

"Year_Assessed"

100% 100%

 

 

100%

"Impr"

100% 100% 5% list as 0 (no structure on land) 100%

"Land”

100% 100% 1% list as 0 100%

"Totl

100% 167 are listed as 0  

100%

Assessments 2012 (Data CD)
Duplications None are duplicated
"Parcel_ID"

236 appear once and the rest are appear multiple times

Variables Completeness Validity Uniqueness Consistency

"Year_Assessed"

100% 100%

 

 

100%

"Impr"

100% 100% 4% list as 0 (no structure on land) 100%

"Land”

100% 100% 1% list as 0 100%

"Totl

100% 173 are listed as 0  

100%

Assessments 2013 (Data CD)
Duplications None are duplicated
"Parcel_ID"

205 appear once and the rest are appear multiple times

Variables Completeness Validity Uniqueness Consistency

"Year_Assessed"

100% 100%

 

 

100%

"Impr"

100% 100% 4% list as 0 (no structure on land) 100%

"Land”

100% 100% 1% list as 0 100%

"Totl

100% 220 are listed as 0  

100%

Assessments (API)
Duplications 21% are duplicated rows
"Assessment.Key” 21% are duplicated
"Real.Estate.Property.Code”

83% are duplications

99% of the properties appear more than once

Variables Completeness Validity Uniqueness Consistency

"Assessment.Date"

100%

YYYY-01-01 for all years have high frequencies as compared to other days

  100%

"Improvement.Value.Amount"

100% 100% 4% list as 0 (no structure on land) 100%

"Land.Value.Amount”

100% 100% 1% list as 0 100%

"Total.Value.Amount"

100% 331 are listed as 0   100%
"Assessment.Change.Reason.Type.Dsc" 199 are missing 100% Levels: "01- Annual", "02- Permit", "03- Board of Equalization" "04- Court Order", "05- Review" ,"09- New Construction" , "16- Tax to Exempt" ,"18- Exempt to Tax" 100%

Assessment Payment History (API)
Duplications

354 are duplicated rows

"realEstatePropertyCode”

96% are duplicated

100% of properties appear more than once

Variables Completeness Validity Uniqueness Consistency

“taxDueDate”

YYYY-06-15 and YYY-10-05 for all years have high frequencies as compared to other days 100%   100%
“displayLabelTypeName” 100% 100% "Adjustment", "Deferral", "Levy",  "Payment", "Relief" 100%
“totalAmt” 100% 100% 1 equals 0 100%

Sales History (API)
Duplications

473 (1%) of rows duplicated

"realEstatePropertyCode"

36% are duplicated

36% properties are listed more than once

Variables Completeness Validity Uniqueness Consistency

“salesTypeDsc”

41% missing 100% 26 levels 100%

“saleAmt”

100% 100% 47% list at 0 100%

“saleDate”

100%

100%   100%

Across the Years Checks

Address

  • 872 properties have at least one different address in 2009, 2010, 2011, 2013,and 2015
  • 191 properties have at lest one different address  in 2009, 2010, 2011, and 2013

Number of Units

  • 206 properties change unit counts across the years
  • 9 have differences in the 20,000+ range due to errors in year 2013 data

Heating Type

  • 1,189 properties have at least one different heating type listed between 2009 to 2013.

Year Built

  • No year built occurred after that year's data (i.e. Year Built of 2012 in 2009 data)
  • 830 properties have at least 1 year built that is different than the rest between 2009, 2010, 2011, 2012, 2013, and 2015.
  • 403 properties have at least 1 year built that is different than the rest between 2009, 2010, 2011, 2012, and 2013.
  • Large difference may indicate a new construction where original building was demolished.

 

Note: These figures do not include the difference where the minimum value was 0 or 1,000.

Attachments:

Arlington County Real Estate Data Dictionary.docx (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
DataDictionary-2015.pdf (application/pdf)
image2015-9-18 15:51:17.png (image/png)
image2015-9-18 16:14:19.png (image/png)
image2015-9-19 16:10:3.png (image/png)
YearBuiltDiff.png (image/png)
UnitDifference.png (image/png)