IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

What Can Boston Restaurant Inspectors Learn from Yelp Reviews?

Harvard-based startup Driven Data is hosting an open competition in which participants use restaurant data to develop algorithms that will predict potential health code violations.

This story was originally published by Data-Smart City Solutions.


Cities around the country are finding new ways to use their administrative data. But data is all around us, and cities can gain significant insights by forming strategic data partnerships and tapping into the knowledge of the online crowd. 

This is the premise behind the new project from startup Driven Data, which asks: What can Boston restaurant inspectors learn from Yelp reviews? In partnership with Yelp, Driven Data is hosting an open competition in which participants use Yelp restaurant data and Boston restaurant inspection data, going back to 2006, to develop algorithms that will predict potential health code violations and help Boston officials more effectively target their inspections.

Driven Data is a startup, based at Harvard’s iLab, that works with public and social sector organizations to identify challenges that can be addressed using machine learning. Once Driven Data and its clients frame the problem, it hosts open competitions on its online platform where users compete to develop the algorithm that provides the best predictions. Hundreds of participants submit algorithms, and Driven Data tests the entries against new data. The code that makes the most accurate predictions wins.

Although founded less than a year ago, Driven Data has already proven its ability to help organizations improve their operations through crowdsourced data science. In its inaugural project, Driven Data worked with a local nonprofit, which helps schools improve their use of funding by standardizing and analyzing school budgets more quickly. With the algorithm developed through Driven Data’s online competition, the organization will save 75% of the time it spends analyzing school budgets, thus increasing the amount of time that highly-specialized staff can devote to developing solutions and reaching a broader set of schools.

With one successful competition behind it, Driven Data is now collaborating with Yelp on the competition to help Boston health inspectors make data-driven decisions about where and when to inspect restaurants. Today, Boston uses its limited number of health inspectors to conduct annual inspections of all food establishments as well as spot-checks, sending inspectors to randomly selected locations drawn from a list of restaurants. While inspectors have extensive personal experience that might guide their inspections, they lack a system that allows them to learn from past patterns or leverage new data to identify likely sites of code violations. With the use of algorithms developed by Driven Data’s participants, inspectors will be able to more effectively make site selections based on knowledge drawn from historical patterns and the latest customer reviews.

Yelp is providing data for the competition, including restaurant reviews, user information, check-ins, and business metadata going back to 2004. These reviews are matched to Boston’s restaurant inspection list and historical inspection outcome data, from October 2006 through today, which is available through the city’s open data portal. Competitors will use the patterns that connect online reviews with past violations to predict where new violations are most likely.

The competition, launched on April 27th, is open to the public and runs for 8 weeks. Once the competition comes to a close on June 23rd, Driven Data will test predictions against new data for the next 6 weeks. Algorithms will be judged by how well they predict violations compared to the normal inspections, and the three top performers will be awarded a total of $5,000 in prizes. Based on past competitions and the size of the current community, Driven Data is expecting between 200 and 500 competitors.

While the competition draws on Boston data, the algorithms have the potential to improve predictions in any city, based on any type of review. The challenge is founded on the algorithms’ ability to process text data, which can then be applied to reviews from Yelp in other cities or from any other online aggregator. All entries are required to be open source, so that code will be available to the City of Boston, Yelp, or any other government or company that wants to refine and implement the algorithms to help them improve local operations.

Driven Data is interested in helping cities broaden their understanding of the data that is available to them, and broaden their community of data scientists. By collaborating with corporate partners, cities can develop productive data partnership models and use the collective insights of city residents and other partners to drive smarter government. By incentivizing community participation in data-driven decision-making, cities can engage their citizens in the process of developing a better, smarter government.