Document representation

Bag of words

The bag-of-words representation of a document is the matrix representation. It neglects word order and only stores the word counts in each document.


[Topic Model] Perplexity is a standard performance measure used to evaluate models of text data. It measures a model’s ability to generalise and predict new documents: the perplexity is an indication of the number of equally likely words that can occur at an arbitrary position in a document. A lower perplexity therefore indicates better generalisation. We calculate…