Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.

We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).

We'll investigate libraries like spaCy for doing natural language processing (in Python).

We'll dive into the research on Markov Chains for Natural Language Generation

Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.

Looking for mad skills in:

python nltk webapps flask sqlalchemy

This project is part of:

Hack Week 15

Activity

  • almost 2 years ago: cschum liked RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • almost 2 years ago: ericp added keyword "sqlalchemy" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • almost 2 years ago: ericp added keyword "flask" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • almost 2 years ago: ericp added keyword "webapps" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • almost 2 years ago: ericp added keyword "nltk" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • Show History

    Comments

    • ericp
      almost 2 years ago by ericp | Reply

      Mixed results on the hack.

      On one hand, I learned how to use Python's nltk library to do natural language processing on the reviews, to the point that I could lex each word. nltk has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, the flask and sqlalchemy libraries were straightforward Python analogs of Ruby's sinatra and ActiveRecord/Sequel ORMs, nothing new here. And the Yelp API was straightforward to work with.

      The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned byhttps://api.yelp.com/v3/businessesBUSINESSID/reviews also contained a URL, and following that URL gave me the full text of 19 reviews.

      And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.

      The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.

    Similar Projects

    Improve supplychain security in the build service by kbabioch

    In the past I've worked on a set of [scripts](h...


    Bugzilla Sync for Taiga by suntorytimed

    What is Taiga?

    On the first view Taiga (...


    package mediagoblin by mstrigl

    From the mediagoblin.org website:

    "MediaGobl...


    logorator: an offline internal analytics tool by dleidi

    There are customer use cases where sharing info...


    Training Labs Python Port, Liberty Support and OpenSUSE 13.2 support by dguitarbite

    Porting training labs to Python. This includes ...