VT Census Case Studies : Yelp

Brief Overall Description of the Dataset:

Yelp is a multi-national corporation headquartered in San Francisco, California. It publishes crowd-sourced reviews about local businesses, as well as online reservation service and food delivery services. Yelp had an average of approximately 142 million monthly unique visitors in Q1 2015 and Yelpers have written over 77 million local reviews.

In addition to an API for developers, yelp provides data in two formats: the Yelp academic dataset and dataset challenge. The academic database contains information on Business (type, id, name, location, stars, review count, category, open, URL), Review (type, business id, user id, stars, text, date, votes) and User (type, id, name, review cout, average stars, votes This data is is currently available for the 250 closest businesses for 30 universities for students and academics to explore and research. The challenge dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses along with rich attributes data (such as hours of operation, ambience, parking availability) for these businesses, social network information about the users, as well as aggregated check-ins over time for all these users --- for 10 select cities internationally.

Link: http://www.yelp.com

Date Inventory Completed: 6/9/205

Screening

  • Is the data collected opinion-based?
  • Is the data collection recurring (must be collected at least annually)?
  • Is there data available for 2013?
  • Is the data collected at the property or housing unit level?
  • Can we access the data by August 15th?

Purpose

  • What is the purpose of the organization collecting the data?

Yelp was founded in 2004 to help people find great local businesses like dentists, hair stylists and mechanics.”

  • Why is it collected and how does the organization use it?

Data is collected by yelp to provide insight on local businesses and amenities.

  • Who else uses the data?

Businesses, citizens, researchers

  • Who do they sell the data to?

No one, datasets are available to researchers. Business have free access but limited ability to change data.

 

Method

  • What is the data collection method? 

Online reviews

  • What is the type of data collected? 

Digital

  • If designed, who created the questions?

  • What is the raw source of the collected data (prior to any aggregation)? 

Location, reviews


Description

  • What is the general topic of the data (1-2 words)?

Amenities and reviews

  • What are the earliest and latest dates for which data is available?

Not stated

  • Is data collected and available periodically?

Yes, continuously 

  • How soon after a reference period ends can a data source be prepared and provided? 

Instantaneously


Selectivity

  • What is the universe (e.g., population) that the data represents ?

Amenities (stores, restaurants, businesses) in the US and select international cities --- as long as individual (owner or consumer) made a page for that site.


Accessibility

  • How is the data accessed? 

API, data download for academic and challenge datasets (Each file is composed of a single object type, one json-object per-line.)

  • Is it open data?

Partially

  • Any legal, regulatory, or administrative restrictions on accessing the data source?

Yelp's official API is restrictive and only returns snippets of the three most recent reviews for a business and prohibits the use of its API for data aggregation and analysis of returned reviews. https://www.yelp.com/developers/documentation/v2/business

Cannot be scraped.

For other two databases: one needs an active Yelp account, access to the Yelp API, and agree to the dataset access agreement to access the dataset.

Terms of Service: http://www.yelp.com/static?p=tos 

  • Cost? - One time or annual or project based payment?

None

Does this dataset appear to meet our needs for the Census study? No

Explanation

The areas of interest are not covered and the terms of service prohibits web scraping or using API for research.