Brief Overall Description of the Dataset:
TripAdvisor claims to be the largest travel site in the world, with more than 60 million members and over 170 million reviews and opinions of hotels, restaurants, attractions and other travel-related businesses. TripAdvisor claims that “by offering a better experience, users will be more successful in finding what they are looking for, resulting of course in more advertising revenue in the end. They are using several Big Data techniques to achieve this, ranging from large-scale real-time analytics, predictive analytics, data mining and statistical modeling. All focussed on delivering better personalized recommendations for the visitor.”
Link: http://www.tripadvisor.com/
Date Inventory Completed: 6/9/2015
Screening
- Is the data collected opinion-based?
- Is the data collection recurring (must be collected at least annually)?
- Is there data available for 2013?
- s the data collected at the property or housing unit level?
- Can we access the data by August 15th?
Purpose
What is the purpose of the organization collecting the data?
“TripAdvisor offers advice from millions of travelers and a wide variety of travel choices and planning features with seamless links to booking tools that check hundreds of websites to find the best hotel prices.”
Why is it collected and how does the organization use it?
The objective of TripAdvisor is to help users find better content more quickly.
Who else uses the data?
Researchers, travelers, businesses
Who do they sell the data to?
Data is open
Method
What is the data collection method?
User generated content
What is the type of data collected?
Digital
If designed, who created the questions?
What is the raw source of the collected data (prior to any aggregation)?
Location information and reviews
Description
What is the general topic of the data (1-2 words)?
Amenities, hotels, reviews of locations
What are the earliest and latest dates for which data is available?
Not stated, but up to present
Is data collected and available periodically?
Yes, continuously
How soon after a reference period ends can a data source be prepared and provided?
TripAdvisor have stated that reviews are not posted to the website instantly.
Selectivity
What is the universe (e.g., population) that the data represents?
TripAdvisor claims to be the largest travel site in the world, with more than 60 million members and over 170 million reviews and opinions of hotels, restaurants, attractions and other travel-related businesses.
Those location where users generated content.
Accessibility
How is the data accessed?
TripAdvisor provides free, up-to-date listings, ratings, and review content to qualified travel websites and apps through its Content.
Is it open data?
Yes
Any legal, regulatory, or administrative restrictions on accessing the data source?
TripAdvisor Content API is for consumer-facing travel websites and apps only.
- Cost? - One time or annual or project based payment?
None
Does this dataset appear to meet our needs for the Census study? Yes
Full Inventory
Description
- Features
- What is the temporal nature of the data: longitudinal, time-series, or one time point?
Longitudinal
- Geospatial? If Yes, at what level?
TripAdvisor does not provide any support for mapping individual properties. However, Content API users can provide some additional parameters to their API call to assist with their own mapping process, or use the recommend /location_mapper API call
API calls made with a TripAdvisor geo location ID or latitude & longitude will return the following data, if available:
Up to 10 restaurants, attractions & accommodations (hereafter referred to as Points of Interest, or POIs)
Business Details for each POI including latitude and longitude
Results can be filtered by category/subcategory/cuisine/attraction-type
What is the scope of the records? (e.g. program participants, geographic areas?)
Locations of points of interest
Metadata
- Is there information available to assess the transparency and soundness of the methods to gather the data for our purposes?
No
- Is there a description of each variable in the source along with their valid values?
Yes, in API description
- Are there unique IDs for unique elements that can be used for linking data?
None stated
- Is there a data dictionary or codebook?
https://developer-tripadvisor.com/content-api/documentation/
API calls made with a TripAdvisor ID will return the following, if available:
Location ID, name, address, latitude & longitude
Read reviews link, write-a-review link
Overall rating, ranking, subratings, awards, the number of reviews the rating is based on, rating bubbles image
Price level symbol, accommodation category/subcategory, attraction type, restaurant cuisine(s)
Three most recent review snippets of 200 characters in length, including author display name, author location, review rating, title, publish date, a link to read the full review.
Selectivity
- What unit is represented at the record level of the data source?
Location (amenity/hotel/etc)
Does this universe match the stated intentions for the data collection? If not, what has been included or excluded and why?
Unknown
What is the sampling technique used (if applicable)?
None
What was the coverage?
None stated
Stability/Coherence
- Were there any changes to the universe of data being captured (including geographical areas covered) and if so what were they?
None stated
- Were there any changes in the data capture method and if so what were they?
None stated
- Were there any changes in the sources of data and if so what were they?
None stated
Accuracy
- Any known sources of error?
Fraudulent reviews. Fake reviews are harmful for entrepreneurs, are useless to visitors and in the long run will negatively impact TripAdvisor
- Describe any quality control checks performed by the data’s owner.
“Extensive algorithmic protection in place to prevent fraud.”
Data are subject to a verification process which considers the IP address and email address of the author, and tries to detect any suspicious patterns or obscene or abusive language. The website also allows the community of users to report suspicious content, which is then assessed by a team of quality assurance specialists. TripAdvisor also alerts the owner or manager of a TripAdvisor-listed establishment whenever a review is posted on their listing.
Accessibility
- Any records or fields collected, but not included in data source, such as for confidentiality reasons)?
None stated
- Is there a subset of variables and/or data that is must be obtained through a separate process? (e.g. state level data openly available, but one must apply to get census tract)? If yes, is there a separate legal, regulatory, or administrative restrictions on accessing the data source? Cost? - One time or annual or project based payment?
None stated
Privacy and security
- Was consent given by participant? If so, how was consent given?
None stated (user generated content; permission does not need to be given by owner to leave a review.
- Are there legal limitations or restrictions on the use of the data?
https://developer-tripadvisor.com/content-api/terms-and-conditions/
- What confidentiality policies does the source have?
None stated
Research
- What research has been done with this dataset?
Pricing optimization, what special offers will work, where to advertise
- Include any links to research if provided:
- List any other data use notes provided by the supplier.
Gaps/Concerns
- Feasibility - can all jurisdiction levels provide the data (if applicable)?
- Data ownership - a lack of clarity in legal guidance stemming from a lack of clarity with who owns digital data?
- Data collection authority - what data is reasonably private and what constitutes unwarranted intrusion?
- Describe any other notes you have or any gaps/concerns you see with this dataset: