Can you scrape Yelp?

Can you scrape Yelp? A cursory look at the Google search results for this query might have you believe that you cannot. This is because the first result that pops up is Yelp’s support page, and it clearly deters any scraping enthusiasts with a straight forward, “No, Yelp does not allow any scraping of the site….”.

What does Yelp say?

Can you scrape Yelp? A cursory look at the Google search results for this query might have you believe that you cannot. This is because the first result that pops up is Yelp’s support page, and it clearly deters any scraping enthusiasts with a straight forward, “No, Yelp does not allow any scraping of the site….”.

Well, I would have been satisfied with this answer, only if it really answered the question we’re asking here. Whether Yelp “allows” us to scrape their site or not doesn’t really matter. Scraping public information remains a fundamental right of every internet user.

The answer to that is a straight forward, “Yes, with some persistence you can”. Go ahead and paste the following command on your terminal, what response do you see?

curl https://www.yelp.com/biz/crowne-plaza-hy36-midtown-manhattan-new-york

You should most likely see a huge blob of HTML on your terminal’s standard output. If you see this, that means you did just in fact scrape Yelp—well, kind off.

Why (and why not) scrape Yelp?

Here are a few good uses to extract data from Yelp. These range from academics to enhancing your customer experience.

  • Tracking the performance and customer reviews of businesses and try to get insights on different business categories.
  • To analyze reviews using machine learning models in an attempt to understanding the customers’ language and get insights about various aspects of their experience.
  • Verifying the authenticity and estimating the performance of a business in an automated manner. This could be very useful to the credit and insurance industry.
  • To add Yelp reviews to your internal dashboards for monitoring and marketing purposes.

Since we’re listing some good reasons to scrape Yelp, even though they do not “allow” it. We do think that the following are BAD reasons to scrape Yelp and we do not endorse these applications.

  • Blackhat SEO websites – copying and publishing Yelp reviews to your own ghost website without attribution.
  • Request Yelp’s servers at an unreasonably high rate. To avoid this, maintain healthy rate limit.
  • Increase impressions on advertisements artificially with conflicting interest.

How to scrape Yelp?

For developers

If you’re a developer and have experience with Python, I would recommend jumping right into to BeautifulSoup. And if you prefer JavaScript, cheerio is worth a look. These libraries make it easy to select the elements you are looking for—using different HTML document selectors, like classes, or other attributes.

Your scraper will request the HTML page in an automated way and select the elements you’re looking for. You can then run a loop over the URLs you wish to scrape and store the data in the format your desire. This should work at a small scale, but soon, Yelp will notice that there’s something fishy about your requests and you will see a page telling you that you’re not allowed to visit Yelp anymore.

What must you do then? Using a tool designed to do this is the quickest way, unless you’re experienced with rotating web proxies and the stack that’s required to bypass modern anti-bot systems—in which case you would probably not be reading this post in the first place.

For non-developers

Using an existing tool that does this for you is the most cost effective solution to this problem. It’s not uncommon for experienced developers to also use a tool rather than investing their own time to manually write and maintain a parser for Yelp.

This is where we step in. Unwrangle’s Yelp Search API and Yelp Reviews API let you query search results and business reviews from Yelp with a simple get request. The inputs required are similar to when we’re browsing Yelp manually. The search endpoint requires a keyword, location and page number and the reviews endpoint requires the listing URL and the page number.

If you are not used to using APIs, you can also use our self service application to get all reviews for any business on Yelp without any code. Sign up here to get a free trial and start scraping Yelp in seconds.

As an example, here’s a link to a dataset with more than 20,000 reviews from popular hotels in Las Vegas that we created with our own API.

Our API will let you scrape millions of reviews from Yelp and other websites without worrying about anit-bot systems. So that you can focus more on finding insights from your customer’s feedback. We’d love to hear from you, write to us at support@unwrangle.com.