Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.

We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).

We'll investigate libraries like spaCy for doing natural language processing (in Python).

We'll dive into the research on Markov Chains for Natural Language Generation

Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.

Looking for mad skills in:

python nltk webapps flask sqlalchemy

This project is part of:

Hack Week 15

Activity

  • almost 2 years ago: cschum liked RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • about 2 years ago: ericp added keyword "sqlalchemy" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • about 2 years ago: ericp added keyword "flask" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • about 2 years ago: ericp added keyword "webapps" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • about 2 years ago: ericp added keyword "nltk" to RankWell: Markov Chain Generation of Yelp Restaurant Reviews
  • Show History

    Comments

    • ericp
      almost 2 years ago by ericp | Reply

      Mixed results on the hack.

      On one hand, I learned how to use Python's nltk library to do natural language processing on the reviews, to the point that I could lex each word. nltk has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, the flask and sqlalchemy libraries were straightforward Python analogs of Ruby's sinatra and ActiveRecord/Sequel ORMs, nothing new here. And the Yelp API was straightforward to work with.

      The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned byhttps://api.yelp.com/v3/businessesBUSINESSID/reviews also contained a URL, and following that URL gave me the full text of 19 reviews.

      And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.

      The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.

    Similar Projects

    grab this: openSUSE beta test program and web application by lnussel

    openSUSE Leap 42.3 goes for a rolling release m...


    Extend urlwatch to support monitoring of GitHub (and other git) repos by kbabioch

    I'm currently using [urlwatch](https://github.c...


    Integrate Machinery into SLEnkins (QA-automation-testing) by dmaiocchi

    WEB_PAGES:

    <https://slenkins.suse.de>

    ...


    FATE sync for Taiga by suntorytimed

    What is Taiga?

    On the first view Taiga (...


    Convert the openATTIC project web site from Typo3 to Nikola (static content generator) by LenzGr

    Overview

    Currently, the [openATTIC...