Guest Comment Processing Step 1 – Normalizing Phrases

This entry is part 2 of 2 in the series Guest Survey Analysis and Topic Mining  

The first step in creating a model for classifying guest comments is the following:

  • Tokenize the comment into sentences and even phrases within sentences.
  • Normalize words to minimize synonyms.
  • Remove unnecessary words using a stop list as well as parts of speech.
  • Combine words that can be evaluated together. ¬†For example “Not Good” (which can confuse a model) will be combined to be “Not-Good”.

Here is a script that walks through each step.

It is important to fine-tune this stage prior to building the model. This way when processing the unclassified comments, they will be processed/normalized using the same methods that were used to create the model.

Series Navigation<< Guest Survey Sentiment Analysis and Topic Mining Made Simple

Leave a Reply

Your email address will not be published. Required fields are marked *