VT Census Case Studies : CoreLogic

Brief Overall Description of the Dataset:

CoreLogic obtains (and analyzes) property data from tax assessors and county recorders offices across the nation. Data included tax, market value, building, and deed information. Working directly with the sources of data, they are able to match, clean standardize, and refine their data to insure quality and help with use.

Screening

  • Is the data collected opinion-based?
  • Is the data collection recurring (must be collected at least annually)?
  • Is there data available for 2013?
  • For Housing: Is the data collected at the property or housing unit level?
  • Can we access the data by August 15th?

Purpose

  • What is the purpose of the organization collecting the data?

CoreLogic’s purpose is to collect and analyze data in order to sell data to interested parties.

  • Why is it collected and how does the organization use it?

The information is collected in order to sell it to interested parties.

  • Who else uses the data?

Businesses and researchers.

(Georgetown and US Merced)

  • Who do they sell the data to?

“For multiple users, primarily business”

CoreLogic (crime), Federal Reserve

 

Method

  • What is the data collection method? 

Data is collected from “third parties,” which include “industry and government entities” such as: County assessors, County tax collectors, County courthouses, County recorders, State revenue, taxation agencies, U.S. bankruptcy court, Multiple listing services, Real estate brokers and agents, Appraisers, Servicers, and FEMA

Two databases: TAX (get from accessor once a year for each parcel) and Transaction (get from accessor or collect documents)

  •   What is the type of data collected? 

CoreLogic collects administrative data. See: https://www.corelogic.com/downloadable-docs/capital-markets-data-sources.pdf

  • If designed, who created the questions? Not applicable

  • What is the raw source of the collected data (prior to any aggregation)? 

City/county records


Description

  • What is the general topic of the data (1-2 words)?

Property, “risk assessment” (since the data deals with properties and with people)

  • What are the earliest and latest dates for which data is available?

On academic portal: 2014

Historical Tax Role: around 2005. They used to over ride data when new data came in (about a decade) 

  • Is data collected and available periodically?

Yes. Data from county is yearly by some counties and quarterly for others.

  • How soon after a reference period ends can a data source be prepared and provided? 

Unknown

 

Selectivity

  • What is the universe (e.g., population) that the data represents?

This information was collected from the “United States, Australia, New Zealand and United Kingdom. In the United States alone, we cover 99% of all properties, which equates to 147 million properties and 99.9% of the population.”  For other types of data (besides property), they do have quite a large bulk of data.


Accessibility

  • How is the data accessed? 

Current data is available on online portal for researchers.

Historical data is available by bulk data download via FTP

  • Is it open data?

No

  • Any legal, regulatory, or administrative restrictions on accessing the data source?

Need licenses agreement and SOW, which has data restriction information

  • Cost? - One time or annual or project based payment?

For historical data, $2,000 a year per state. For access of VA data on portal: $7,128.

Does this dataset appear to meet our needs for the Census study? Yes

Full Inventory

Description

  • What is the general contents of the data source?

“Mortgage- and asset-backed securities information, Property tax data, Geospatial parcel data, Flood and disaster risk data, Traditional and nontraditional credit information, and Criminal background records”; “property characteristics, tax records, comparable sales, property valuations, neighborhood analysis, flood information, and other proprietary and supplemental data”

  • Features
    • What is the temporal nature of the data: longitudinal, time-series, or one time point?

Longitudinal

    • Geospatial? If Yes, at what level?

There are geospatial data sets.  They have a service called “ParcelPoint” that “is a geocoding and spatial analytics engine that converts address or location information into geographic coordinates.”

 

Metadata

  • Is there information available to assess the transparency and soundness of the methods to gather the data for our purposes?=0

There is information about how CoreLogic runs its information through checks of quality, which can be found at this link:http://www.corelogic.com/about-us/data.aspx#container-Quality.  The company also does obtain its data from reputable sources.

  • Is there a description of each variable in the source along with their valid values?

In the ‘data dictionary,’ the variable names are listed, and the definitions of the variables are given. http://www.corelogic.com/research/hpi/march-2015-corelogic-hpi-national-historic-data.pdf 

  • Are there unique IDs for unique elements that can be used for linking data?

Linking usually is usually done through standardized assessors parcel (unique by county)

  • Is there a data dictionary or codebook?

Yes (emailed)

 

Selectivity

  •  What unit is represented at the record level of the data source? 

Property.  However, data with regard to criminal records, would be by individual.

  • Does this universe match the stated intentions for the data collection? If not, what has been included or excluded and why?

Unknown

  • What is the sampling technique used (if applicable)? 

N/A

  • What was the coverage?

None stated

 

Stability/Coherence

  • Were there any changes to the universe of data being captured (including geographical areas covered) and if so what were they?

Unknown

  • Were there any changes in the data capture method and if so what were they?

Unknown

  • Were there any changes in the sources of data and if so what were they? 

Unknown

 

Accuracy

  • Any known sources of error?

completeness of data is down to the county level, which changes daily as new data is acquired. “If not there, it’s  no to be had”

  • Describe any quality control checks performed by the data’s owner.

These are the quality control checks CoreLogic reports to have done: “All data checked at aggregate level for inconsistent behaviors and greater-than-expected changes from prior reporting.  All data run through automated edit checks to catch anomalies.  Irregular data verified against outside sources—or removed.” (Link: http://www.corelogic.com/about-us/data.aspx#container-Quality)

 

Accessibility

  • Any records or fields collected, but not included in data source, such as for confidentiality reasons)?

Unknown

  • Is there a subset of variables and/or data that is must be obtained through a separate process? If yes, is there a separate legal, regulatory, or administrative restrictions on accessing the data source?  Cost? - One time or annual or project based payment? 

Unknown

 

Privacy and security

  • Was consent given by participant? If so, how was consent given?

Consent would have been given when a participant signed mortgages and other contracts.  However, it is not completely clear with regard to some of the information.

  • Are there legal limitations or restrictions on the use of the data? 

CoreLogic does “restrict an individual’s and/or company’s use” “limited to your own transactions.”’

  • What confidentiality policies does the source have? 

CoreLogic does “restrict an individual’s and/or company’s use” “limited to your own transactions.”’

 

Research

  • What research has been done with this dataset?

Many companies have used CoreLogic data, and these companies have mainly been “property” and “loan” companies.

  • Include any links to research if provided
  • List any other data use notes provided by the supplier.

The terms of agreement can be found here: https://www.corelogic.com/legal.aspx#terms3