Brief Overall Description of the Dataset:
“The study began in 1968 with a nationally representative sample of over 18,000 individuals living in 5,000 families in the United States. Information on these individuals and their descendants has been collected continuously, including data covering employment, income, wealth, expenditures, health, marriage, childbearing, child development, philanthropy, education, and numerous other topics. The PSID is directed by faculty at the University of Michigan, and the data are available on this website without cost to researchers and analysts.”
Link: http://psidonline.isr.umich.edu/default.aspx
Date Inventory Completed: 5/26/2015
Screening
- Is the data collected opinion-based?
- Is the data collection recurring (must be collected at least annually)? (They did begin collecting every two years for a little while, but they do hope to have 2012 and 2013 data "available soon."
- Is there data available for 2013? There is data collected, but it is not available quite yet. However, my contact said it should "available in a month or two."
- For Housing: Is the data collected at the property or housing unit level? (Done by family unit and individual)
- Can we access the data by August 15th? If there are no issues, we should be able to get the data by then.
Purpose
What is the purpose of the organization collecting the data?
The PSID study is conducted by the University of Michigan, and the purpose of the University of Michigan is to do research and educate students.
Why is it collected and how does the organization use it?
The University of Michigan collects the data because it is conducting a long term longitudinal study about a sample of people in the United States; it wants to use this data to learn about the people of the United States, and it makes much of this data available for “researchers and policy makers” to do their own analyses.
Who else uses the data?
Researchers and Policy-makers
Who do they sell the data to?
They do not sell the data but, if you want to obtain certain restricted data, you have to pay a processing fee.
Method
What is the data collection method?
Survey
What is the type of data collected?
Designed Collection
If designed, who created the questions?
University of Michigan
What is the raw source of the collected data (prior to any aggregation)?
The source of the data would have been the surveys filled out by the surveyors at U of Michigan
Description
What is the general topic of the data (1-2 words)?
Various Housing Statistics
What are the earliest and latest dates for which data is available?
1968-2011 (but I am waiting for an email reply about if 2012 or 2013 data might be available)
Timeliness
Is data collected and available periodically?
Yes
How soon after a reference period ends can a data source be prepared and provided?
Unknown
Selectivity
What is the universe (e.g., population) that the data represents?
The United States; it is “a nationally representative sample of over 18,000 individuals living in 5,000 families in the United States”
Accessibility
How is the data accessed?
Excel
Is it open data?
Yes, except for some restricted data variables
Any legal, regulatory, or administrative restrictions on accessing the data source?
Yes, we would have to go through a process to get the restricted variables: http://simba.isr.umich.edu/restricted/ProcessReq.aspx
My contact also told me: “Once I have your materials, I will send your request to the RDC Review Committee for approval. If approved, we would need a signed contract from you and a $750 administrative fee for use of the data, at which time I could send the data.”
Another link that would be appropriate for us (since we will be looking at Census Track Data) would be this one: http://simba.isr.umich.edu/restricted/docs/ResearchPlan/special_info_request_block_block_group_data.pdf.
Cost? - One time or annual or project based payment?
The cost would be $750 “administrative fee”
Does this dataset appear to meet our needs for the Census study? YES
Full Inventory
Description
What is the general contents of the data source (brief list of the type of variables included)?
“Data covering employment, income, wealth, expenditures, health, marriage, childbearing, child development, philanthropy, education, and numerous other topics.” One way of looking at some of the variables can be found here: http://simba.isr.umich.edu/VS/i.aspx.
Features
What is the temporal nature of the data: longitudinal, time-series, or one time point?
Longitudinal
Geospatial? If Yes, at what level?
Unknown because we need to apply to access the geocodes.
Metadata
Is there information available to assess the transparency and soundness of the methods to gather the data for our purposes?
Because the organization has a detailed “user manual” along with detailed codebooks, it seems fair to say that the collections methods are transparent, and the data is sound.
Is there a description of each variable in the source along with their valid values?
Yes, you can pull up a description of each variable.
Are there unique IDs for unique elements that can be used for linking data?
There are unique IDs, but the unique IDs for families “change from year to year.” However, with regard to “split off families”: “When sample members in any family move out and establish their own household, we interview them (these families are called "split-offs", in the first year they are formed). These new "split-off" families have the same 1968 ID as the family they moved out of, and keep that same 1968 ID each year. All families with the same 1968 ID contain at least one of the original members from the 1968 family or their lineal decedents born after 1968.”
Is there a data dictionary or codebook? If so, put the link here and add to folder.
When you pull data, there is a data dictionary/code book that is uniquely made for the data you pulled. I have put a representative data set with some housing statistics on Google Drive, along with the code book for this data that I pulled. You can also look at the surveys themselves for each year and the complete codebooks for each year here: http://psidonline.isr.umich.edu/Guide/documents.aspx
Selectivity
What unit is represented at the record level of the data source?
You can opt for individual or family unit data.
Does this universe match the stated intentions for the data collection? If not, what has been included or excluded and why?
Unkown
What is the sampling technique used (if applicable)?
The sample has been operating under “steady-state sample design” but there have been “additions” to the sample over time. More information about the sampling technique can be found here: http://psidonline.isr.umich.edu/data/Documentation/UserGuide2011.pdf
What was the coverage ?
Unknown
Stability/Coherence
Were there any changes to the universe of data being captured (including geographical areas covered) and if so what were they?
There were some changes but they have all be noted here: http://psidonline.isr.umich.edu/data/Documentation/UserGuide2011.pdf. The main ones were the “immigrant refreshers” (where they added some immigrants into the sample) and “sample reduction” where they took some people out of the sample.
Were there any changes in the data capture method and if so what were they?
Yes, there were changes. Again complete details about the study can be found: http://psidonline.isr.umich.edu/data/Documentation/UserGuide2011.pdf. From looking at the data itself, one example of changes to the capture method would be that you can see some questions are not available for certain years, some questions were added for certain years, etc.
Were there any changes in the sources of data and if so what were they?
The sample group has changed somewhat over time, so that means the ‘source’ of data has changed some over time.
Accuracy
Any known sources of error?
Any sources of error have been noted in the manual linked above.
Describe any quality control checks performed by the data’s owner. (
There has been some “cross referencing” done to check values (in the case of some “divorces”), and again you can find all of the quality control checks in the “User Manual.”
Accessibility
Any records or fields collected, but not included in data source, such as for confidentiality reasons)? YES
We do have to get special permission for these data sets: Various data including geographic identification codes, the mortality data file and links to the National Death Index, links to the school identifiers from the National Center for Education Statistics data, Hurricane Katrina Supplement data, indicators for assisted housing status, and Medicare claims data, are made available to researchers only under special contract with the University of Michigan.
We also will never be able to get permission for these data sets: No individually identifying information (e.g., name, address) will ever be provided as the basis for research, even under contract.
Is there a subset of variables and/or data that is must be obtained through a separate process? If yes, is there a separate legal, regulatory, or administrative restrictions on accessing the data source? Cost? - One time or annual or project based payment?
Yes, see above for the data sets we have to get special permission for. You have to go through a process that was detailed above. Again the link to this process is here: http://simba.isr.umich.edu/restricted/ProcessReq.aspx The administrative fee is $750.
Privacy and security
Was consent given by participant? If so, how was consent given?
Yes, consent was given; unsure exactly how.
Are there legal limitations or restrictions on the use of the data?
Yes, for example, the “links to the school identifiers from the National Center for Education Statistics” would be under FERPA. Another example would be the “Medicare Claims Data,” which would be under HIPAA. These are just two examples. However, there is a bunch of data that is open source, so we would not have to worry about legal limitations or restrictions for these. More information about further limitations and restrictions can be found here: http://simba.isr.umich.edu/restricted/docs/contractrestricteddata/contract_psid_restricted_data.pdf (For example, “Restricted data” cannot be seen by people besides those “permitted” to see it.)
What confidentiality policies does the source have?
Those policies again can be found here: http://simba.isr.umich.edu/restricted/docs/contractrestricteddata/contract_psid_restricted_data.pdf. For example, a major point in the document is that you cannot “try to identify” the people in the survey.
Research
What research has been done with this dataset?
There has been a huge amount of research done with this data. In fact, “Over 3,000 peer-reviewed publications have been based on the PSID.”
Include any links to research if provided:
Research, assessments of, and papers written about the data can be found here: http://psidonline.isr.umich.edu/Publications/
List any other data use notes provided by the supplier.
I think most, if not all, of these notes would be in the confidentiality policies.
Gaps/Concerns
- Feasibility - can all jurisdiction levels provide the data (if applicable)?
- Data ownership - a lack of clarity in legal guidance stemming from alack of clarity with who owns digital data?
- Data collection authority - what data is reasonably private and what constitutes unwarranted intrusion?
- Describe any other notes you have or any gaps/concerns you see with this dataset: