The first step in creating a model for classifying guest comments is the following:
- Tokenize the comment into sentences and even phrases within sentences.
- Normalize words to minimize synonyms.
- Remove unnecessary words using a stop list as well as parts of speech.
- Combine words that can be evaluated together. For example “Not Good” (which can confuse a model) will be combined to be “Not-Good”.
Here is a script that walks through each step.
It is important to fine-tune this stage prior to building the model. This way when processing the unclassified comments, they will be processed/normalized using the same methods that were used to create the model.